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Abstract 


Robots  would  be  much  more  useful  if  they  could  be  more  robust.  Systems  that  can  tol¬ 
erate  variability  and  uncertainty  are  called  robust  and  the  design  of  robust  feedback  con¬ 
trollers  is  a  difficult  problem  that  has  been  extensively  studied  for  the  past  several  decades. 
In  this  thesis,  we  aim  to  provide  a  quantitative  measure  of  performance  and  robustness 
in  control  design  under  an  optimization  framework,  producing  controllers  robust  against 
parametric  system  uncertainties,  external  disturbances,  and  unmodeled  dynamics.  Under 
the  "Hoc  framework,  we  formulate  the  nonlinear  robust  control  problem  as  a  noncooper¬ 
ative,  two-player,  zero-sum,  differential  game,  with  the  Hamilton-Jacobi-Isaacs  equation 
as  a  necessary  and  sufficient  condition  for  optimality.  Through  a  spectral  approximation 
scheme,  we  develop  approximate  algorithms  to  solve  this  differential  game  on  the  founda¬ 
tion  of  three  ideas:  global  solutions  through  value  function  approximation,  local  solutions 
with  trajectory  optimization,  and  the  use  of  multiple  models  to  address  unstructured  un¬ 
certainties.  Our  goal  is  to  introduce  practical  algorithms  that  are  able  to  address  complex 
system  dynamics  with  high  dimensionality,  and  aim  to  make  a  novel  contribution  to  robust 
control  by  solving  problems  on  a  complexity  scale  untenable  with  existing  approaches  in 
this  domain.  We  apply  this  robust  control  framework  to  the  control  of  humanoid  robots 
and  manipulation  in  tasks  such  as  operational  space  control  and  full-body  push  recovery. 
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Introduction 


1.1  Motivation 

Robots  would  be  much  more  useful  if  they  could  be  more  robust.  Systems  that  can  tolerate 
variability  and  uncertainty  are  called  robust  and  modern  robots  can  rarely  be  classified  as 
such.  In  this  thesis,  we  aim  to  provide  a  quantitative  measure  of  performance  and  robust¬ 
ness  that  leads  to  an  optimization  framework,  from  which  a  controller  can  be  designed  to 
robustly  control  the  behaviors  of  humanoid  robots.  We  develop  these  algorithms  on  the 
foundation  of  three  ideas:  global  solutions  through  value  function  approximation,  local 
solutions  with  trajectory  optimization,  and  the  use  of  multiple  models  to  address  unstruc¬ 
tured  uncertainties.  Our  goal  is  to  introduce  practical  algorithms  that  are  able  to  able  to 
address  complex  system  dynamics  with  high  dimensionality. 

The  key  design  philosophy  of  this  thesis  is  the  idea  of  design  driven  robustness,  as 
opposed  to  data  driven  approaches  commonly  found  in  adaptive  and  learning  systems.  In 
this  work,  we  explicitly  formulate  sources  of  uncertainty,  both  internal  and  external  to  the 
system  dynamics,  and  achieve  robustness  in  a  top-down  manner.  This  is  distinct  from 
the  machine  learning  literature,  where  uncertainties  are  identified  and  learned  from  data, 
processed  bottom-up.  These  two  schools  of  design  are  not  incompatible  with  one  another 
and  this  work  aims  to  complement  existing  data  driven  approaches  to  robust  control. 

The  main  application  area  for  this  thesis  is  the  control  of  complex  mechanical  systems, 
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such  as  humanoid  robots.  Humanoid  robots  are  designed  to  imitate  the  manipulative, 
locomotive,  perceptive,  communicative,  and  cognitive  abilities  of  humans  [3].  On  one 
hand,  humanoid  robots  are  the  ideal  platform  to  introduce  to  human  domains  since  they 
are  compatible  with  made-for-human  tools  and  environments.  On  the  other  hand,  any 
humanoid  robot  with  sufficient  physical  capabilities  poses  a  significant  safety  risk  to  nearby 
humans.  Dynamically  balancing  bipedal  humanoid  robots  are  particularly  dangerous  due 
to  their  high  center-of-mass  and  comparably  small  polygon  of  support.  While  they  are 
more  versatile  on  different  types  of  terrains  compared  to  wheeled  or  multi-legged  robots, 
robust  bipedal  locomotion  has  not  been  demonstrated  outside  of  the  laboratory  setting  as 
stability  can  easily  be  influenced  by  a  number  of  factors.  Before  we  can  expect  to  move 
these  robots  beyond  the  settings  of  research  and  into  human  populated  environments, 
the  issue  of  robustness  must  be  addressed  and  it  is  an  important  component  in  physical 
human-robot  interaction. 

For  a  humanoid  robot  with  many  degrees-of-freedom,  designing  behaviors  by  hand  is 
a  difficult,  if  not  intractable,  task.  While  there  have  been  attempts  to  use  motion  capture 
[4]  to  extract  or  transfer  human  motion  to  robot  control,  the  significant  differences  in  the 
kinematic  structures  of  humans  and  humanoid  robots  make  this  approach  difficult  to  apply 
in  practice.  Instead,  a  philosophy  for  the  control  of  humanoid  robots  is  optimization.  By 
defining  appropriate  optimization  objectives  in  conjunction  with  a  model  of  the  system 
dynamics  and  associated  constraints,  emergent  behaviors  can  be  generated  automatically. 

While  optimization  and  optimal  control  theory  have  been  widely  applied  in  humanoid 
robot  control,  it  is  not  without  drawbacks.  A  blind  application  of  optimization  tends  to 
yield  extremal  solutions  that  exploit  the  given  model  to  improve  performance.  However, 
roboticists  rarely  have  models  that  characterize  the  response  of  the  robot  with  absolute  ac¬ 
curacy,  especially  given  a  complex,  unstable,  high  degree-of-freedom  humanoid.  Typically, 
the  theoretical  dynamics  of  the  robot  are  obtained  through  the  principles  of  mechanics, 
with  experimental  procedures  that  validate  the  basic  equations  of  motion  and  refine  the 
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parameters.  These  experiments  are  corrupted  by  noise  and  the  identified  system  parame¬ 
ters  are  only  known  with  limited  degrees  of  accuracy.  Hydraulic  robots,  in  particular,  have 
oil  seal  friction  and  leakage,  which  causes  parameters  to  vary  with  each  movement.  Over 
time,  degradation  and  wear  also  alter  the  parameters  of  the  system,  and  unless  the  robot 
is  calibrated  periodically  to  re-identify  the  parameters,  the  model  can  deviate  significantly 
from  reality. 

Furthermore,  as  humanoid  robots  are  introduced  to  cluttered  and  uncontrolled  human 
environments,  there  are  various  external  factors  that  can  adversely  affect  the  system.  The 
robot  is  limited  by  its  perceptual  capabilities  and  is  vulnerable  to  changes  in  the  terrain  as 
well  as  actions  of  human  workers  nearby.  Since  external  disturbances  such  as  being  pushed 
cannot  be  predicted  accurately,  it  is  crucial  that  the  safety  performance  of  the  robot  be 
guaranteed  for  worst-case  scenarios.  Contact  properties,  similar  to  seal  friction,  change 
with  every  move.  Together,  addressing  the  accuracy  of  the  models  and  associated  imper¬ 
fections,  as  well  as  external  disturbances,  will  ensure  that  the  performance  of  theoretically 
optimal  designs  will  be  maintained  in  practice. 

In  control  theory,  robust  control  is  an  active  field  which  addresses  many  issues  in  de¬ 
signing  feedback  controls  to  account  for  model  uncertainties  and  external  disturbances. 
Historically,  robust  control  theory  is  rooted  in  analysis  in  the  frequency  domain  that  is 
specific  to  the  control  of  linear  dynamic  systems.  While  recent  works  in  this  area  have 
began  to  address  the  control  of  nonlinear  systems  in  the  time  domain,  most  rely  on  analyt¬ 
ically  manipulating  simple  system  dynamics  to  design  feedback  controls,  which  render  them 
inapplicable  to  more  complex  systems.  For  a  general  class  of  nonlinear  systems,  necessary 
and  sufficient  conditions  can  be  obtained  in  terms  of  the  Hamilton- Jacobi-Isaacs  equation 
(HJI),  a  partial  differential  equation  that  is  difficult  to  solve  and  does  not  admit  a  smooth 
analytical  solution  in  general.  Numerical  techniques  based  on  dynamic  programming  solve 
the  HJI  by  discretizing  the  state  space  of  the  system  on  a  grid,  but  suffer  from  the  curse 
of  dimensionality  and  require  extensive  computational  resources  to  pre-compute  policies 
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offline. 

In  the  next  section,  we  review  existing  methods  for  nonlinear  robust  control  and  discuss 
how  they  relate  to  the  proposed  approach. 


1.2  Background  and  related  work 

1.2.1  Classical  robust  control 

In  control  theory,  work  on  robust  control  originated  in  the  1970s,  motivated  by  the  in¬ 
vention  of  the  Linear-Quadratic  Gaussian  (LQG)  controller.  An  extension  of  the  Linear- 
Quadratic  Regulators  (LQR),  LQG  models  linear  systems  disturbed  by  additive  Gaussian 
noise,  subject  to  quadratic  costs.  LQG  explicitly  model  the  disturbance  and  should  have 
superior  performance  compared  to  LQR,  but  in  fact,  it  was  shown  by  Doyle  [5]  that  LQG 
techniques  provide  no  global  system-independent  guaranteed  robustness  properties.  The 
LQG  case  showed  the  need  for  a  systematic  way  of  demonstrating  robustness  margins  for 
control  designs  and  incorporating  them  into  controller  synthesis. 

The  Hoo  optimal  control  framework  was  developed  in  response  to  the  need  of  address¬ 
ing  modeling  errors  and  the  worst-case  responses  of  a  system.  Instead  of  optimizing  the 
usual  performance  metrics  such  as  rise  time,  settling  time,  energy,  etc,  'HO0  control  focuses 
on  sensitivity  minimization,  which  keep  the  response  of  a  system  under  bounded  distur¬ 
bances.  Work  by  Zames  [6]  was  representative  in  the  formulation  of  sensitivity  reduction 
by  feedback  in  the  framework,  and  was  extended  by  Francis  [7].  The  core  idea  in 
these  studies  is  the  'Hoc-norm,  a  measure  of  the  worst-case  response  of  a  system  seen  as 
the  highest  gain  value  on  a  Bode  magnitude  plot.  Minimizing  the  "Hoo  norm  is  therefore  an 
optimization  problem,  and  when  combined  with  traditional  feedback  designs,  constitute  a 
coherent  framework  in  which  controller  behaviors  can  be  guaranteed.  For  the  history  of 
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'Hoo  control  and  a  comprehensive  review,  we  refer  the  readers  to  several  well-known  text¬ 
books  in  this  area  by  Bhattacharyya  [8],  Zhou  [9],  Skogestad  [10],  and  Dullerud  [11],  as 
well  as  the  references  therein. 

While  classical  robust  control  theory  in  the  form  of  Hoc  control  has  been  well-established, 
it  is  not  directly  applicable  to  humanoid  robots  since  traditional  'H00  designs  rely  on  fre¬ 
quency  domain  analysis  techniques  for  linear  systems,  while  the  dynamics  for  humanoids 
are  highly  nonlinear.  Nonlinear  systems  do  not  have  properties  such  as  superposition,  scal¬ 
ing,  and  homogeneity  [12],  found  in  their  linear  counterparts,  which  means  that  tools  from 
linear  algebra  such  as  the  Laplace  transform  cannot  be  used.  Analysis  of  nonlinear  sys¬ 
tems  are  made  more  difficult  by  phenomenon  such  as  finite  escape  time,  multiple  isolated 
equilibria,  limit  cycles,  and  chaos.  Control  design  for  nonlinear  systems  are  synthesized 
entirely  in  the  time  domain,  which  require  different  techniques  to  derive  the  feedback  con¬ 
trol  law,  particularly  when  the  issue  of  robustness  is  considered.  While  the  'HO0  norm  can 
be  formulated  in  the  time  domain  and  its  minimization  remains  an  optimization  problem, 
the  solution  is  much  more  difficult  to  obtain  due  to  the  nonlinear  nature  of  the  systems 
involved. 

1.2.2  Dynamic  programming 

We  began  this  review  of  modern  nonlinear  robust  control  with  the  dynamic  programming 
principle  developed  by  Bellman  [13].  The  Bellman  equation,  a  central  result  in  dynamic 
programming,  is  a  necessary  and  sufficient  condition  for  optimality  which  discretizes  and 
solves  a  given  optimization  problem  recursively.  The  use  of  the  Bellman  equation  in  a 
continuous-time  nonlinear  optimal  control  problem  [14]  results  in  the  Hamilton-  Jacobi- 
Bellman  (HJB)  equation  [15],  which  applies  to  the  most  general  form  of  nonlinear  dynamics 
and  constraints.  The  HJB  is  a  nonlinear  first-order  partial  differential  equation,  and  when 
solved  over  the  entire  state  space  of  a  system,  is  a  necessary  and  sufficient  condition  for 
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optimality.  The  solution  to  the  HJB,  the  value  function,  gives  the  optimal  cost-to-go  for  a 
given  system  and  there  is  a  relationship  between  the  value  function  of  an  optimal  control 
problem  and  the  feedback  control  law. 

In  the  context  of  nonlinear  robust  control  with  an  external  disturbance,  the  optimiza¬ 
tion  problem  constitutes  a  minimax  problem  in  the  sense  that  we  try  to  minimize  the 
objective  function  under  the  worst  possible  disturbances  or  parameter  variations,  which 
maximizes  the  same  objective  function.  The  minimax  problem  also  has  a  game  theoretic 
interpretation  as  a  two-player,  noncooperative,  zero-sum  differential  game  [16].  This  prob¬ 
lem  is  a  game  in  that  a  minimizing  player  seeks  to  determine  control  inputs  that  minimize 
the  objective  function,  while  an  adversarial  maximizing  player  seeks  to  maximize  the  same 
objective  function.  It  is  a  differential  game,  since  both  players  are  constrained  by  the 
system  dynamics,  a  set  of  ordinary  differential  equations,  as  well  as  associated  constraints. 
Furthermore,  the  game  is  non-cooperative  as  each  player  makes  decisions  independent  of 
the  other  player.  The  outcome  is  zero-sum ,  since  one  player’s  gain  is  exactly  balanced  by 
the  other  player’s  loss. 

In  the  context  of  game  theory,  the  equilibrium  strategy  for  this  differential  game  consti¬ 
tutes  a  Nash  equilibrium  [17],  also  called  a  saddle-point,  which  can  be  intuitively  explained 
as  a  pair  of  strategies  secured  against  any  attempt  by  one  player  to  unilaterally  change  his 
strategy.  In  a  continuous-time  setting,  the  application  of  the  dynamic  programming  princi¬ 
ple  to  a  differential  game  results  in  the  Hamilton- Jacobi- Isaacs  (HJI)  equation  [15],  which 
can  be  seen  as  the  dual  to  the  HJB.  It  is  also  a  first-order  nonlinear  partial  differential 
equation  and  serves  as  the  necessary  and  sufficient  condition  for  optimality.  We  refer  the 
reader  to  Ba§ar  [18],  Engwerda  [19],  and  Friesz  [20]  for  a  detailed  treatment  of  differential 
games  in  the  context  of  optimization  and  optimal  control.  For  subsequent  discussions,  we 
will  use  the  terms  robust  control,  minimax  control,  and  differential  games  interchange¬ 
ably.  We  will  mostly  ignore  more  detailed  classifications  of  differential  games  as  solution 
approaches  tend  to  be  valid  for  all  forms  of  differential  games  with  minor  modifications. 
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While  the  formulation  of  the  nonlinear  robust  control  problem  is  straightforward,  the 
difficulty  in  solving  the  HJI  equation  remains  the  biggest  bottleneck  to  the  practical  ap¬ 
plication  of  nonlinear  "Hoo  control.  For  general  nonlinear  systems,  no  smooth  analytical 
solution  can  be  expected.  One  approach  to  solving  the  HJI  is  the  notion  of  viscosity  so¬ 
lution  [15].  When  written  in  the  form  of  linear  differential  operators,  the  HJI  satisfy  an 
ellipticity  condition  on  the  monotonicity  in  the  diffusion  operator.  As  a  result,  existence 
and  uniqueness  of  viscosity  solutions  can  be  demonstrated  in  the  form  of  a  value  function. 
However,  the  viscosity  approach  can  only  be  applied  to  a  small  subset  of  problems  and 
general  nonlinear  solutions  remain  intractable.  Methods  from  numerical  analysis,  such  as 
Galerkin’s  method,  can  be  used  to  convert  the  HJI  from  a  continuous  operator  form  to  a 
discrete  problem.  Representative  references  in  this  area  include  Georges  [21],  Beard  [22] 
[23]  [24],  Alamir  [25],  and  Ferreira  [26].  The  drawback  of  Galerkin-based  approaches  is 
the  need  to  successively  produce  discrete  forms,  which  is  difficult  to  implement  in  practice. 
Related  approximate  methods  such  as  Tsiotras  [27]  considered  the  use  of  Taylor  series  ap¬ 
proximation  of  the  value  function  to  iteratively  solve  the  HJI  term  by  term,  provided  that 
a  solution  to  the  linearized  model  of  the  nonlinear  system  exists.  Similarly,  Grimble  [28] 
proposed  a  version  of  the  H0 0  design  in  which  the  system  is  modeled  as  a  combination  of 
a  linear  and  a  nonlinear  subsystem,  where  the  plant  can  have  severe  non-smooth  nonlin¬ 
earities  but  the  disturbance  and  reference  models  are  assumed  linear.  These  approaches 
do  not  have  a  closed  form  solution  and  are  not  guaranteed  to  converge. 

More  recently,  Feng  [29]  [30]  proposed  an  iterative  algorithm  to  solve  the  HJI  for  a 
broad  class  of  nonlinear  system  by  solving  a  sequence  of  HJBs  whose  solutions  can  be 
approximated  recursively  by  existing  methods,  with  a  local  quadratic  rate  of  convergence. 
However,  as  the  solution  of  HJBs  remain  difficult  to  obtain,  this  approach  is  only  the 
beginning  of  a  potential  direction  to  solve  nonlinear  robust  control  problems.  Finally,  more 
heuristic  approaches  have  been  proposed  by  Kim  [31],  which  augments  a  nominal  control 
design  using  a  PID-type  auxiliary  input,  which  adds  "Hoo-type  performance  measures  to 
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achieve  additional  robustness. 

There  exists  a  rich  literature  on  the  solution  to  the  nonlinear  'H00  problem,  in  control 
theory,  differential  game,  Hamilton- Jacobi  theory,  and  various  other  fields.  We  refer  the 
readers  to  Helton  [32] and  Aliyu  [33],  as  well  as  the  references  therein,  for  an  overview  of 
various  techniques  in  this  area.  To  date,  no  effective  analytical  solutions  to  the  HJI  exists. 
Instead,  we  focus  on  reviewing  numerical  solutions  to  the  HJI,  and  also  other  approximate 
solutions  to  "Hoc  control  which  do  not  rely  on  solving  the  HJI. 

The  chief  numerical  algorithms  for  dynamic  programming  problems  are  policy  iteration 
(also  known  as  Howard’s  algorithm)  and  value  iteration  (also  known  as  backward  induction) 
[34],  These  algorithms  play  an  important  role  in  the  study  of  Markov  decision  processes 
(MDPs)  [35]  and  reinforcement  learning  ( RL )  [36].  When  solved  on  a  grid  over  some 
bounded  region  of  the  state-space,  the  number  of  grid  points  grows  exponentially  to  the 
dimension  of  the  state-space.  Few  successful  implementations  of  these  numerical  schemes 
have  been  demonstrated  for  systems  beyond  a  few  states. 

1.2.3  Approximate  dynamic  programming 

An  extension  to  the  numerical  approaches  to  dynamic  programming  is  approximate  (or 
adaptive)  dynamic  programming  (ADP).  ADP  encompasses  an  umbrella  of  methods  re¬ 
lated  to  dynamic  programming,  called  forward  /  incremental  /  iterative  /  heuristic  / 
neurodynamic  programming,  and  other  variations.  We  refer  readers  to  Powell  [37]  for 
a  comprehensive  review  of  these  methods.  In  short,  the  idea  of  value  function  approxima¬ 
tion  is  central  to  ADP,  where  the  true  Bellman  value  function  is  replaced  by  a  statistical 
approximation  followed  by  modifications  to  the  iterative  procedure  that  updates  this  ap¬ 
proximate  function.  A  related  method  to  value  function  approximation  is  policy  search, 
where  the  parameters  in  the  parametric  representation  of  the  policy  is  searched  directly  for 
a  solution  that  satisfies  the  control  objectives.  A  mixture  of  value  function  approximation 
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and  policy  search  is  often  referred  to  as  an  actor-critic  design  [38]  in  the  sense  that  an 
actor  directly  optimizes  the  controller  while  a  critic  observes  the  performance  of  a  given 
controller  based  on  a  particular  value  function  and  then  derives  an  improved  controller 
by  updating  the  value  function.  When  an  explicit  representation  of  both  the  policy  and 
the  value  function  is  maintained,  then  both  the  actor  and  critic  act  simultaneously  in  an 
actor-critic  design.  The  actor-critic  architecture  is  widely  used  in  reinforcement  learning 
and  we  refer  the  readers  to  Busoniu  [39]  for  a  recent  survey  on  the  use  value  function 
approximation  in  this  area. 

There  exists  an  extensive  literature  of  using  ADP  to  control  dynamic  systems,  partic¬ 
ularly  using  neural  networks  as  function  approximators.  This  stems  from  the  fact  that 
any  feedforward  network  with  a  single  hidden  layer  containing  a  finite  number  of  neurons 
(in  the  simplest  case,  a  multilayer  perceptron)  can  approximate  any  continuous  functions 
with  mild  assumptions  on  the  activation  function  [40].  While  there  are  other  alternative 
universal  approximators  (for  example,  support  vector  machines  [41]),  neural  networks  re¬ 
main  popular  for  optimal  control  problems  due  to  the  availability  of  formal  convergence 
properties  for  certain  configurations.  We  refer  the  readers  to  Miller  [42]  for  a  comprehen¬ 
sive  review  of  neural  network  methods  for  optimal  control  problems  and  Zhang  [43]  for 
ADP-specific  methods.  We  focus  the  subsequent  discussion  on  ADP  for  'Hoc  design  and 
two-player  differential  games,  with  ADP  predominantly  take  the  form  of  actor-critic  archi¬ 
tecture.  Approaches  have  been  proposed  for  systems  of  various  configurations,  including 
linear  and  nonlinear  dynamics  that  are  known,  partially  known,  or  unknown.  Some  ap¬ 
proaches  account  for  constraints  on  the  dynamics,  while  others  have  the  ability  to  learn  in 
an  online  fashion. 

The  simplest  application  of  ADP  is  for  linear  'H00  control  with  known  dynamics.  Al- 
Tamimi  [44]  proposed  a  heuristic  dynamic  programming  approach  where  action  and  critic 
networks  were  used  to  solve  for  the  value  function  and  the  costate  of  a  linear  discrete¬ 
time  zero-sum  game.  The  approach  can  be  seen  as  a  way  of  solving  the  Riccati  equation 
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that  arises  from  the  linear  game.  Li  [45]  extended  this  approach  to  achieve  model-free 
learning  assuming  the  underlying  dynamics  to  be  a  completely  unknown  linear  continuous¬ 
time  system.  In  this  approach,  one  critic  network  and  two  action  networks  were  used 
to  approximate  the  value  function,  control  and  disturbance  policies,  with  a  least  squares 
method  for  estimating  the  unknown  parameters. 

Similar  actor-critic  methods  can  be  applied  to  nonlinear  systems  as  well.  Representa¬ 
tive  approaches  include  Liu  [46] ,  where  three  neural  networks  were  used  to  approximate  the 
value  function,  control  input,  and  disturbance,  respectively,  and  updated  using  a  greedy 
heuristic  dynamic  programming  algorithm  in  conjunction  with  discrete-time  affine  non¬ 
linear  dynamics.  Convergence  can  be  proven  for  the  scheme  and  tracking  control  can  be 
accomplished  through  a  system  transformation.  Similarly,  Zhang  [47]  proposed  an  iterative 
ADP  method  with  four  action  networks  and  two  critic  networks.  Other  methods  have  fo¬ 
cused  on  simplifying  the  architecture  by  reducing  the  number  of  function  approximations. 
In  approaches  such  as  Dierks  [48] ,  a  single  approximator  based  scheme  was  used  to  achieve 
optimal  regulation  and  tracking  control  for  an  affine  system.  Zhang  [49]  used  single  critic 
network  for  each  player  instead  of  the  actor-critic  dual  network  for  a  non-zero-sum  game. 

While  the  most  common  forms  of  the  HJI  do  not  account  for  constraints  on  the  states 
and  control  inputs,  some  ADP  approaches  can  be  modified  to  include  these  constraints. 
Representative  methods  include  Abu-Khalaf  [50],  where  a  neurodynamic  programming 
algorithm  in  conjunction  with  a  two-player  policy  iteration  was  applied  to  a  nonlinear 
affine  systems  with  input  saturations.  The  result  is  a  closed- form  representation  of  the 
feedback  strategies  and  the  value  function.  Related  work  by  Abu-Khalaf  [51]  solved  the 
HJI  by  decomposing  it  into  a  sequence  of  differential  equations  linear  in  the  cost  for  which 
closed-form  solutions  are  easier  to  obtain.  The  approach  is  combined  with  neural  network 
approximations  and  policy  iteration  to  derive  numerical  solutions. 

While  most  ADP  approaches  are  executed  offline  to  converge  to  a  value  function  and 
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a  corresponding  policy,  several  approaches  promised  the  ability  to  identify  solutions  in 
an  online  setting.  Assuming  completely  known  dynamics,  Vamvoudakis  [52]  proposed  a 
method  to  learn  online,  in  real  time,  an  approximate  local  solution  to  the  HJI.  Termed 
synchronous  ZS  game  policy  iteration,  this  approach  was  built  on  the  actor-critic  design 
and  used  simultaneous  continuous-time  adaptation  of  critic,  actor,  and  disturbance  neural 
networks.  With  the  ability  to  learn  online,  it  is  then  natural  to  extend  'H00  designs  with 
dynamics  that  are  only  partially  known  or  completely  unknown.  Notable  approaches  in¬ 
clude  Mehraeen  [53]  with  an  iterative  approach  using  neural  networks  that  approximated 
a  linear  value  function  for  discrete-time  nonlinear  systems  with  partially  unknown  internal 
system  dynamics,  approximated  by  an  additional  neural  network.  Zhang  [54]  presented 
the  learning  of  a  completely  unknown  nonlinear  system  in  a  data-driven  approach,  where 
the  dynamics  is  essentially  replaced  by  available  input-output  data.  In  Wu  [55],  an  online 
simultaneous  policy  update  algorithm  was  used  for  a  system  with  unknown  dynamics.  Fi¬ 
nally,  Luo  [56]  proposed  an  off-policy  reinforcement  leaning  design  to  learn  the  solution  of 
the  Issacs  equation  from  data,  under  a  neural  network  based  actor-critic  structure. 

Since  no  approximation  strategy  works  for  all  optimization  problems  without  modifi¬ 
cation,  ADP-based  designs  are  highly  domain  specific.  While  most  existing  applications 
of  ADP  in  robust  control  utilized  neural  networks  due  to  their  universal  function  approx¬ 
imation  ability,  these  approaches  are  not  without  drawbacks.  In  practice,  while  a  single 
hidden  layer  neural  network  can  approximate  any  function,  it  is  often  inefficient.  This  is 
due  to  the  fact  that  neurons  may  need  to  be  allocated  to  every  small  volume  of  the  input 
space.  As  the  number  of  such  small  volumes  grows  exponentially  in  the  dimensionality  of 
the  input  space,  convergence  is  rendered  exponentially  slow. 

A  remedy  for  this  problem  exists  in  machine  learning  in  the  form  of  so-called  deep  neural 
networks  or  deep  learning  [57],  where  a  many-layered  feedforward  neural  network  is  trained 
one  layer  at  a  time,  treating  each  layer  as  an  unsupervised  restricted  Boltzmann  machine, 
then  using  supervised  backpropagation  for  fine-tuning.  Though  a  deep  architecture  is  not  a 
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universal  approximator  unless  it  is  also  exponentially  large,  it  remains  a  viable  alternative 
to  traditional  neural  networks  for  many  applications.  So  far,  these  deep  and  recurrent 
architectures  have  been  used  for  optimal  control  [58],  but  not  applied  to  robust  control. 

Reinforcement  learning  and  learning-based  solve  the  same  robust  control  problem,  but 
using  a  different  underlying  philosphy.  Robust  control,  as  formulated  under  the  'Hpafty 
approach,  explicitly  models  potential  sources  of  uncertainties,  both  internal  and  external 
to  the  system.  This  is  a  design-drive ,  or  top-down  approach.  Learning-based  approaches, 
on  the  other  hand,  elect  to  assume  no  existing  knowledge  of  the  underlying  dynamics,  and 
instead  explores  the  state  space  by  perturbing  the  system.  This  is  a  data-driven ,  or  bottom- 
up  approach.  When  the  level  of  uncertainties  in  the  system  is  small  and  isolated,  then  a 
design-driven  approach  can  often  simply  the  design.  If  there  are  extensive  uncertainties 
that  are  hard  to  model,  then  a  data-driven  approach  would  have  an  advantage. 

1.2.4  Local  value  function  approximations 

The  philosophy  of  most  ADP-based  techniques  reviewed  in  Section  1.2.3  is  to  use  universal 
function  approximators,  which  are  able  to  converge  to  the  true  value  function  given  enough 
computational  resources.  An  alternative  approach  is  to  limit  the  order  of  the  approximated 
value  function  in  order  to  gain  computational  efficiency. 

In  this  approach,  a  simple  way  to  approximate  the  value  function  is  by  defining  poly¬ 
nomial  basis  functions  and  form  an  optimization  problem  to  search  for  the  associated 
coefficients.  This  is  still  done  in  the  context  of  the  Bellman  equation,  which  can  be  relaxed 
to  an  inequality,  which  the  polynomial  approximation  must  satisfy.  This,  by  definition, 
provides  an  underestimate  of  the  value  function.  An  example  of  this  strategy  can  be  found 
in  Wang  [59],  which  utilized  a  quadratic  approximation  in  its  ADP  implementation.  When 
viewed  in  the  context  of  general  low-order  approximations  to  the  value  function,  the  spec¬ 
trum  of  techniques  known  as  sum- of- squares  (SOS)  optimization  and  differential  dynamic 


12 


Jiuguang  Wang 


Numerical  Nonlinear  Robust  Control  with  Applications  to  Humanoid  Robots 


programming  (DDP)  can  be  classified  under  the  broad  family  of  ADP  algorithms,  though 
with  the  distinction  of  producing  only  locally  optimal  solutions. 

In  SOS,  convex  optimization  [60]  is  used  to  automatically  search  for  Lyapunov  functions 
to  prove  the  stability  of  nonlinear  dynamic  systems  in  a  polynomial  form.  SOS  is  based  on 
the  idea  of  positivity  of  polynomials,  which  was  shown  by  [61]  to  form  a  convex  semidefinite 
programming  (SDP)  problem  [62].  Since  a  Lyapunov  function  is  a  matrix  polynomial  with 
constraints  on  positivity  (or  non- negativity) ,  SOS  is  a  natural  tool  in  the  verification  of 
Lyapunov  function  candidates.  An  associated  problem  is  the  estimation  of  the  domain 
of  attraction  (Do A),  which  defines  a  neighborhood  around  an  equilibrium  for  which  all 
trajectories  converge  to  the  equilibrium.  The  sub-level  set  of  a  Lyapunov  function  is 
related  to  an  inner  estimate  of  the  size  of  the  domain  of  attraction. 

Since  a  value  function  satisfies  all  the  properties  of  a  Lyapunov  function,  the  Lyapunov 
function  found  through  SOS  optimization  can  be  seen  as  a  quadratic  approximation  to 
the  value  function.  While  originally  developed  only  for  the  verification  of  stability,  SOS 
methods  were  extended  to  controller  syntheses  as  well.  The  simultaneous  search  for  a 
controller  and  the  associated  domain  of  attraction  can  be  formulated  as  a  single  SDP 
problem,  which  can  be  achieved  by  rewriting  conditions  from  basic  Lyapunov  stability 
theory  [12]  as  linear  matrix  inequalities  (LMIs)  [63].  By  assuming  a  quadratic  form  of  the 
Lyapunov  function,  the  resulting  problem  is  convex  and  can  be  readily  solved  using  interior 
point  methods  [64],  The  reader  is  referred  to  Chesi  [65]  for  a  comprehensive  review  of  SOS 
techniques  for  traditional  control  applications  and  we  focus  the  subsequent  discussions  on 
results  relevant  to  robust  control. 

There  exists  a  rich  literature  on  the  use  of  SOS  and  LMIs  for  the  control  of  uncertain 
systems  and  we  refer  the  reader  to  a  recent  survey  by  Petersen  [66]  on  the  subject.  There 
is,  however,  a  distinction  between  the  worst-case  uncertainty  assumed  by  Woo  designs  and 
the  bounded  uncertainty  models  used  by  SOS  approaches  such  as  Ichihara  [67]  and  Ma 
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[68] .  Where  these  approaches  assumed  a  bounded  uncertainty  (in  terms  of  disturbances  or 
parametric  uncertainty),  they  do  not  model  the  worst-case  and  hence  are  not  subject  to 
a  minimax  or  differential  game  interpretation.  Most  applications  of  LMIs  to  1-Loo  control 
focus  on  linear  systems,  such  as  in  Gahinet  [69],  who  introduced  solvability  conditions 
involve  Riccati  inequalities.  Poznyak  [70]  proposed  a  class  of  attractive  ellipsoids  methods 
to  construct  linear-type  feedback  control  synthesis  that  can  also  be  applied  to  nonlinear 
affine  systems  in  the  presence  of  uncertainties. 

For  general  nonlinear  systems,  a  convex  parameterization  of  nonlinear  Hoo  control  was 
given  by  Lu  [71]  in  terms  of  nonlinear  matrix  inequalities,  which  connected  the  solution 
to  Lyapunov  stability  but  did  not  provide  a  computationally  tractable  solution.  Further 
developments  by  Prempain  [72]  formulated  the  /^-analysis  problem  which  bypassed  the 
need  to  solve  the  HJI  with  a  new  problem  linear  in  the  Lyapunov  function  parameters. 
This  was  done  under  the  SOS  framework  with  convex  state-dependent  LMIs  for  for  an 
affine  nonlinear  system.  For  single-input  polytopic  systems,  Wu  [73]  proposed  necessary 
and  sufficient  conditions  for  verifying  robust  storage  functions  and  provided  conditions 
for  the  existence  of  continuous  robust  state  feedback  controllers.  More  recently,  Wei  [74] 
utilized  SOS  for  £2-gain  analysis  and  synthesized  state  feedback  Hoc  control  iteratively 
based  on  an  initial  controller  that  can  stabilize  the  system  to  a  finite  C2- gain.  Lofberg 

[75]  developed  robust  convex  programming  where  uncertain  semidefinite  and  second  order 
cone  constraints  were  employed  to  optimize  against  the  worst-case  cost.  Finally,  Zheng 

[76]  reformulated  the  HJI  inequalities  as  SDP  conditions  and  used  high  order  Lyapunov 
functions  in  conjunction  with  SOS  programming  to  obtain  output  feedback  laws  for  a  class 
of  nonlinear  systems  without  orthonormal  and  decoupling  assumptions. 

SOS  techniques  are  only  applicable  to  polynomial  systems.  For  systems  dynamics  that 
are  not  originally  polynomial,  for  example,  mechanical  systems  with  trigonometric  terms, 
a  high-order  Taylor  expansion  can  be  used  to  obtain  a  polynomial  representation.  This 
creates  additional  computational  burden  and  creates  inaccuracies  in  the  results.  In  general, 
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computational  efficiency  with  respect  to  dimensionality  is  a  main  drawback  of  SOS-based 
approaches.  While  SOS  is  not  subject  to  the  curse  of  dimensionality  in  the  traditional  sense, 
the  search  for  a  Lyapunov  function  remains  difficult  when  the  dimensionality  of  the  system 
is  high.  This  stems  from  the  fact  that  the  number  of  monomial  basis  functions  used  to 
construct  the  Lyapunov  function  explodes  exponentially  with  the  number  of  states.  While 
there  are  techniques  to  actively  prune  unnecessary  monomials  and  reduce  the  number  of 
optimization  parameters  (for  example,  the  method  of  Newton  polytopes  [77]),  it  leads  to 
a  trade-off  with  the  quality  of  the  resulting  approximation.  This  is  a  criticism  that  can  be 
extended  to  all  forms  of  function  approximators,  such  as  those  reviewed  in  Section  1.2.3. 

More  recently,  several  approaches  have  combined  forms  of  value  function  approxima¬ 
tion  with  trajectory  optimization  methods,  which  produce  locally  optimal  solutions  to  the 
underlying  optimization  problem.  Methods  based  on  differential  dynamic  programming 
(DDP)  [78],  for  example,  have  been  applied  to  solve  optimal  control  problems  [79],  with 
some  variants  [80]  specific  for  'H00  designs.  DDP  approximates  the  value  function  using 
locally  quadratic  models  and  uses  local  trajectory  optimizers  to  globally  optimize  a  policy 
and  associated  value  functions.  Another  related  class  of  method  is  the  so-called  path  in¬ 
tegral  controls  [81]  [82],  which  utilizes  an  exponential  transformation  of  the  Bellman  value 
function  to  reduce  the  stochastic  Hamilton- Jacobi-Bellman  equation  into  a  linear  form, 
which  can  then  be  solved  with  Monte  Carlo  sampling.  Path  integrals  and  related  formu¬ 
lations  using  direct  numerical  optimization  [83]  can  be  seen  as  a  trajectory-based  value 
function  approximation.  Finally,  LQR-trees  [84]  [85]  combines  SOS-based  stability  veri¬ 
fication  with  sampling-based  planners  under  the  rapidly-exploring  random  tree  paradigm 
[86],  which  creates  a  sparse  tree  of  trajectories  that  probabilistically  covers  the  entire  con¬ 
trollable  subset  of  state  space.  DDP,  path  integrals,  and  LQR-trees  all  have  variants  which 
deal  with  optimal  and  stochastic  control  problems,  but  so  far,  there  is  only  limited  work 
on  investigating  the  applicability  of  these  approaches  in  robust  control  problems. 
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1.2.5  Trajectory  optimization 

Dynamic  programming  (and  by  extension,  ADP)-based  approaches  to  optimal  and  robust 
control  yield  globally  optimal  solutions,  by  the  virtue  of  solving  the  full  necessary  and 
sufficient  conditions  for  optimality.  As  these  approaches  become  intractable  for  large  scale 
problems,  researchers  have  turned  to  more  relaxed  forms  of  optimization  which  produce 
only  local  solutions.  A  representative  approach  is  Pontryagin’s  minimum  ( also  maximum) 
principle  (PMP)  [14],  which  is  only  a  necessary  condition  for  optimality  when  satisfied 
along  a  trajectory.  Pontryagin  extended  the  classical  calculus  of  variations  [87]  to  consider 
inequality  constraints  in  the  control  inputs,  and  in  particular,  to  cases  where  the  control 
inputs  are  bounded. 

In  short,  the  PMP  stated  that  the  Hamiltonian  must  be  minimized  over  the  set  of  all 
permissible  controls  within  their  bounded  region  at  each  point  along  the  path.  This  is  an 
observation  consistent  with  dynamic  programming,  which  minimizes  the  Hamiltonian  at 
each  point  in  the  state  space.  The  PMP  also  defined  the  relationship  of  the  state  variable 
with  the  costate ,  also  known  as  the  adjoint ,  which  are  the  Lagrange  multipliers  used  in 
the  constrained  optimization.  For  simple  problems,  the  optimal  controls  can  be  solved  as 
functions  of  the  costate.  For  more  complex  problems,  the  solution  of  the  costate  equa¬ 
tions  and  the  associated  optimal  controls  form  a  two-point  boundary-value  problem  (BVP) 
[88],  which  can  be  solved  numerically  to  yield  an  open- loop  trajectory  to  the  underlying 
optimization  problem.  We  refer  the  reader  to  Bryson  [14]  and  Kirk  [89]  for  a  classical 
treatment  of  the  PMP  in  optimal  control. 

The  PMP  has  been  extended  to  differential  games  since  Pontryagin’s  time  [90].  Whereas 
dynamic  programming,  ADP,  and  other  local  value  function  approximations  reviewed  in 
previous  sections  are  closed-loop  solutions  to  the  differential  game,  the  PMP  gave  open- 
loop  admissible  controls.  Closed-loop  strategies  imply  that  the  strategy  is  a  function  of 
both  time  and  state,  which  changes  as  the  system  evolves,  based  on  the  current  state  of 
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the  system.  In  an  open-loop  solution  setting,  both  players  formulate  their  strategy  at  the 
moment  the  system  starts  to  evolve,  based  on  information  available  at  the  time:  the  system 
dynamics,  the  objective  function,  and  initial  conditions.  This  strategy  is  only  a  function 
of  time,  and  cannot  be  changed  once  the  system  evolves.  The  saddle-point  solution  is  the 
combination  of  strategies  of  both  players  which  are  secured  against  any  attempt  by  one 
player  to  unilaterally  change  his  strategy  (the  definition  of  a  Nash  equilibrium). 

In  the  context  of  robust  control  under  the  and  minimax  formulations,  a  represen¬ 
tative  work  is  Boltyansky  [91]  where  a  version  of  the  PMP  was  developed  for  the  minimax 
control  of  systems  with  unknown  parameters  from  a  given  finite  set.  This  work  was  gener¬ 
alized  by  Boltyanski  [92]  to  include  problems  with  Bolza  and  Lagrange  forms  of  objective 
functions.  A  recent  book  by  Boltyanski  [93]  summarized  the  developments  in  this  area. 

Algorithms  based  on  the  PMP  are  often  referred  to  as  indirect  trajectory  optimiza¬ 
tion,  owning  to  the  fact  that  an  extra  step  of  forming  the  necessary  conditions  is  done 
first  before  a  discrete  optimization  problem  is  obtained.  In  contrast,  direct  trajectory  opti¬ 
mization  transcribes  a  continuous-time  optimal  control  problem  directly  into  an  equivalent 
nonlinear  programming  problem  (NLP)  [94],  This  is  done  by  parameterizing  the  state  and 
control  spaces  using  a  parametric  representation  and  solving  the  resulting  optimization  by 
satisfying  the  constraints  at  particular  collocation  points.  This  is  an  idea  borrowed  from 
the  numerical  solution  of  differential  equations  [95],  for  which  the  collocation  method  is 
a  well-established  approximate  method.  The  use  of  collocation  in  optimal  control  is  a 
a  NLP  problem,  which  can  in  turn,  be  converted  to  a  sequential  quadratic  programming 
(SQP)  problem  [96]  and  solved  with  a  numerical  optimization  solver  such  as  SNOPT  [97] 
to  produce  the  locally  optimal  solutions. 

One  of  the  most  popular  direct  trajectory  optimization  tools  is  pseudospectral  meth¬ 
ods  [98].  It  is  a  class  of  direct  collocation  methods  where  the  state  and  control  vectors 
are  parametrized  by  Lagrange  polynomials  [99]  and  collocating  the  differential-algebraic 
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equations  using  nodes  obtained  from  a  Gaussian  quadrature.  Various  strategies  have  been 
proposed  for  the  form  of  the  collocation  points,  the  three  most  commonly  used  are  the 
Legendre-Gauss(LG),  Legendre-Gauss-Radau  (. LGR ),  and  Legendre-Gauss-Lobatto  (LGL) 
points,  which  are  obtained  from  the  roots  of  a  Legendre  polynomial,  belonging  to  a  more 
general  class  of  Jacobi  polynomials  [100].  Orthogonal  polynomial  enjoy  the  property  of 
spectral  accuracy  [101],  which  states  that  the  approximation  enjoys  an  exponential  conver¬ 
gence  as  the  degree  of  the  polynomial  is  increased,  and  hence  avoid  the  so-called  Runge’s 
phenomenon  [101].  Other  forms  of  polynomial  approximations,  such  as  splines,  RBFs,  and 
wavelets,  do  not  have  spectral  accuracy.  In  addition,  as  the  derivatives  of  orthogonal  poly¬ 
nomials  can  be  written  in  terms  of  the  same  orthogonal  polynomials,  mechanical  systems 
in  terms  of  position,  velocity,  acceleration  can  be  more  compactly  represented. 

Comparing  direct  and  indirect  methods  for  trajectory  optimization,  it  can  be  noted  that 
both  require  initial  guesses  for  the  solutions  to  be  in  the  neighborhood  of  the  optimum  in 
order  to  produce  good  results.  Indirect  methods  tends  to  enjoy  more  accurate  overall 
solution  with  assurances  on  its  local  optimality.  But  forming  the  analytical  expressions  for 
the  necessary  conditions  of  optimality  can  be  infeasible  for  large  scale  nonlinear  dynamics, 
where  direct  methods  have  an  advantage.  Direct  methods  can  incorporate  state  and  input 
constraints  into  the  optimization  framework,  in  addition  to  enjoying  more  readily  available 
solvers  as  well  as  being  more  robust  in  convergence  with  respect  to  inaccurate  initial  guesses. 
A  broad  overview  of  numerical  methods  in  optimal  control  can  be  found  in  Betts  [94]  and 
Subchan  [102] 

Another  advantage  of  trajectory  optimization  based  solutions  to  optimal  control  is 
the  ability  to  deal  with  under- actuated  systems,  which  has  less  number  of  control  inputs 
compared  to  the  total  number  of  degrees  of  freedom  for  the  system.  Unlike  analytical  tools 
in  nonlinear  control  design,  an  optimization-based  approach  can  determine  the  proper 
allocation  of  control  inputs  to  the  system,  inherently  handling  the  under-actuation.  An 
example  of  this  can  be  found  in  [103]. 
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While  direct  trajectory  optimization  has  been  successful  in  addressing  optimal  control 
problems,  applying  it  to  robust  control  is  not  straightforward  and  a  naive  implementation 
tends  to  result  in  an  expensive  optimization  problem.  To  the  best  of  our  knowledge, 
no  successful  implementation  of  direct  trajectory  optimization  has  been  demonstrated  for 
robust  control  problems. 

1.2.6  Receding  horizon  control 

One  outstanding  issue  in  both  optimal  and  robust  control  is  how  an  open-loop  solution 
to  the  optimization  problem  can  be  applied  in  a  closed-loop  setting.  While  open-loop 
solutions  are  admissible  control  inputs  and  obey  the  dynamics,  they  cannot  compensate 
for  responses  of  the  system  that  differ  from  the  predicted  model.  Doing  so  would  require 
feedback,  which  is  the  basis  for  all  closed- loop  systems. 

One  popular  approach  in  applying  open-loop  optimization  is  receding  horizon  control 
(RHC),  also  known  as  model  predictive  control  (MPC).  In  a  RHC  setting,  an  optimization 
problem  is  solved  and  applied  over  a  small  planning  horizon.  As  the  response  of  the  system 
diverges  from  the  predicted  model,  a  new  optimization  problem  is  solved  and  the  process 
is  repeated  until  convergence.  The  advantage  of  RHC  compared  to  traditional  control 
designs  lies  in  its  ability  to  iteratively  apply  simple  controllers  to  solve  more  complex  prob¬ 
lems.  Similar  to  how  constrained  optimal  control  problems  can  be  tackled  by  iteratively 
solve  a  series  of  unconstrained  ones  [104],  the  receding- horizon  formulation  can  be  used 
to  design  closed-loop  controls  with  open-loop  trajectories.  Franz  [105]  computed  B-spline 
parameterized  trajectories  with  a  SQP-based  method,  and  applied  the  trajectories  in  a 
RHC  formulation.  A  survey  by  Diehl  [106]  reviewed  related  methods  with  Newton  type 
optimization  methods  and  SQP  in  the  context  of  RHC  for  optimal  control. 

We  refer  readers  to  Kwon  [107]  for  a  review  of  general  RHC  methods  in  optimal  control 
and  instead  focus  the  subsequent  discussion  on  the  use  of  RHC  in  robust  control  prob- 
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lems.  The  issue  of  robustness  in  RHC  can  be  considered  both  passively  and  actively.  A 
passive  approach  designs  a  RHC  with  nominal  stability  and  consider  stability  margins  for 
uncertainty  and  disturbances  in  retrospect.  As  shown  in  Magni  [108],  certain  RHC  designs 
are  optimal  in  the  sense  that  it  is  also  optimal  for  a  modified  optimal  control  problem 
spanning  over  an  infinite  horizon.  Therefore,  the  RHC  is  robust  with  respect  to  sector 
bounded  input  uncertainties,  a  property  inherited  as  the  sampling  time  goes  to  zero. 

An  active  approach  to  robust  RHC  directly  models  uncertainty  and  disturbance  in  an 
'Hoc  sense,  and  hence  is  more  relevant  to  our  discussion.  It  is,  however,  not  a  trivial  endeavor 
since  this  formulation  requires  solving  the  HJI  in  some  form.  Chen  [109],  Blauwkamp 
[110],  and  Magni  [111]  [112]  applied  a  game  theoretic  approach  to  nonlinear  RHC,  for 
which  open-loop  solutions  to  linearized  'H0o  subproblems  were  obtained  and  applied  in  a 
receding-horizon  fashion.  While  this  simplified  the  designs  considerably,  in  certain  cases,  no 
feasible  solution  can  be  found  at  all  as  one  input  signal  must  reject  all  possible  disturbances. 
This  is  a  limitation  of  the  linear  nature  of  the  design.  More  recently,  Yu  [113]  proposed 
a  LMI-based  solution  to  the  HJI,  on  a  linearized  model  of  the  original  nonlinear  system, 
which  guaranteed  /^-performance.  While  still  linear  in  nature,  it  is  an  improvement  on 
previous  open- loop  solutions.  Related  is  Gautam  [114],  which  presented  a  receding  horizon 
style  coordinated  control  framework  involving  dynamically  decoupled  subsystems  which 
are  required  to  reach  a  consensus  condition,  with  the  underlying  'H00  problem  solved  with 
LMIs. 

Overall,  robust  RHC  remains  a  largely  unsolved  problem.  This  is  due  to  the  fact  that 
solving  the  HJI,  even  in  a  limited  sense,  is  computationally  burdensome  and  cannot  be 
easily  applied  in  an  iterative  setting. 
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1.2.7  Multi- model  robust  control 

Given  that  nonlinear  'H0o  designs  in  terms  of  the  HJI  are  difficult  to  pursue,  a  heuristic 
approach  to  designing  robust  optimal  feedback  control  called  multi-model  control  evaluates 
the  performance  of  a  given  control  law  over  a  distribution  of  possible  models.  A  distribution 
of  models  allow  the  optimization  to  consider  the  expected  cost  over  a  range  of  possible 
outcomes  and  account  for  the  maximum  cost  (or  the  worst  case).  In  essence,  the  philosophy 
of  multi-model  control  does  not  fundamentally  differ  from  that  of  1-Loo  control,  though  in 
practice,  the  problem  structure  differs  significantly  from  traditional  Hoc  designs. 

The  use  of  a  distribution  of  models  has  been  used  in  a  number  of  different  setting  in 
robotics,  machine  learning,  and  state  estimation  and  control.  In  reinforcement  learning, 
for  example,  Cutler  [115]  proposed  the  use  of  multifidelity  simulators,  which  effectively 
is  a  distribution  of  possible  world  models  in  which  the  learning  is  based  on.  In  robotics, 
Cunningham  [116]  designed  the  controller  such  that  it  execute  a  policy  from  a  set  of 
plausible  closed-loop  policies,  derived  from  models  based  on  partially  observable  Markov 
decision  processes.  Ensemble  control  [117]  is  a  class  of  methods  which  model  a  family  of 
independent,  structurally  identical,  finite-dimensional  systems  with  variations  in  system 
parameters.  A  single  common  controller,  which  can  be  designed  under  an  optimal  control 
framework  [118],  steers  the  family  of  models  between  points  of  interests.  Ensemble  control 
has  been  applied  in  robotics  settings  [119],  but  is  restrictive  in  the  forms  of  structural 
parametric  uncertainties  that  can  be  modeled. 

In  multi-model  control  designs,  an  early  work  by  Varga  [120]  designed  optimal  output 
feedback  control  for  multi-model  linear-quadratic  systems.  More  rigorously,  Poznyak  [121] 
formulated  necessary  conditions  for  multiple  model  LQ  systems  using  the  maximum  prin¬ 
ciple,  which  was  recently  extended  in  [122]  and  Miranda  [123].  For  multi- model  nonlinear 
systems,  Azhmyakov  [124]  was  the  first  to  formulate  necessary  and  sufficient  conditions  us¬ 
ing  dynamic  programming.  A  recent  book  by  Boltanski  [93]  summarized  these  approaches 
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from  the  prospective  of  both  the  dynamic  programming  principle  as  well  as  Pontryagin’s 
minimum  principle.  Closely  related  is  Whitman  [125],  where  the  implementation  of  a 
dynamic  programming-based  control  design  for  the  multi-model  pendulum  swing-up  prob¬ 
lem  demonstrated  the  effectiveness  of  multi-model  designs  in  addressing  uncertainty.  While 
these  attempts  are  more  expressive  in  modeling  uncertainty  and  thus  more  robust  compared 
to  the  traditional  Woo  approach,  they  are  unattractive  from  a  computational  viewpoint: 
the  two-point  boundary  value  problem  arising  from  the  maximum  principle  is  difficult 
to  solve  while  the  dynamic  programming-based  approaches  are  ultimately  limited  in  the 
dimensionality  of  the  system  that  they  can  handle. 

Multi-model  design  are  also  used  in  trajectory  optimization  where  only  locally  optimal 
solutions  are  desired.  McNaughton  [126]  designed  the  system  CASTRO  using  a  direct 
collocation  based  trajectory  optimizer  DIRCOL  and  simultaneously  optimized  trajectories 
for  multiple  models,  each  using  different  estimates  for  the  system  parameters.  The  result¬ 
ing  system  was  demonstrated  on  a  two-link  pendulum  swing  up  problem.  More  recently, 
Atkeson  [127]  gave  a  policy  optimization  approach  where  first  and  second  order  analytic 
gradients  were  used  to  obtain  local  solutions  to  multi- model  dynamics.  While  these  ap¬ 
proaches  are  computationally  attractive,  they  lack  the  rigorous  theoretical  guarantees  for 
convergence  and  optimality.  Depending  on  the  specific  baseline  trajectory  optimization 
method  used,  it  can  also  be  difficult  to  implement  and  parallelize  the  algorithms. 


1.3  Formulation 

1.3.1  System  dynamics 

In  this  section,  we  describe  the  general  formulation  of  nonlinear  robust  control  as  an  op¬ 
timization  problem  that  will  be  used  throughout  the  remainder  of  the  thesis.  Consider  a 
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nonlinear  system 


x  =  f{x)  +  g{x)u{t ) 
z  =  [h(x),  u]T 

x(to)=x  0,  (1.1) 

with  x(t)  E  X  is  the  state  vector,  z(t)  E  Z  is  the  output  vector,  u(t)  E  U  is  the  vector  of 
controlled  inputs  to  the  system,  xq  is  a  vector  of  initial  states,  and  the  matrices  f(x),  g(x), 
and  h(x)  are  the  system  matrices  that  are  smooth  vector  fields  not  explicitly  parameterized 
by  time.  This  form  is  commonly  referred  to  as  the  control  affine  form  since  it  linear  in  the 
actions  but  nonlinear  with  respect  to  the  states.  We  assume,  without  a  loss  of  generality, 
that  the  origin  x(t)  =  0  is  an  equilibrium  of  the  system,  i.e.,  x(t)  =  0,Vt  E  [to,oo).  Given 
any  initial  states  x0,  we  call  u(x,t )  a  feedback  control  law  or  policy  that  is  an  explicit 
function  of  the  states.  When  u  is  not  parameterized  by  the  states,  then  we  call  x(t.  t0,  x0,  u) 
the  trajectory  of  the  system  from  initial  time  to  to  final  time  tf. 

In  robust  control,  we  wish  to  consider  potential  sources  of  uncertainty  that  significantly 
affect  the  performance  of  the  system.  Typically,  these  uncertainties  may  include 

•  External  disturbances:  external  perturbations  to  the  system  which  are  potentially 
time- varying  and  can  alter  the  equilibrium  state  of  the  system 

•  Parametric  uncertainty:  parameters  within  the  system  dynamics,  such  as  mass  and 
stiffness,  that  are  difficult  to  determine  with  complete  accuracy  or  known  to  vary  due 
to  degradation  and  wear  over  time 

•  Unmodeled  dynamics:  higher-order  dynamics  (such  as  actuator  dynamics)  that  has 
not  been  modeled  but  has  has  a  significant  effect  on  the  response  of  the  system  in 
real  life 
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Where  in  traditional  model-based  control  designs,  these  uncertainties  are  assumed  to  be 
addressed  by  the  feedback  controls  implicitly ,  we  wish  to  explicitly  design  the  controller  to 
account  for  these  uncertainties.  We  make  the  assumption  that  the  disturbances  are  bounded 
and  the  parametric  uncertainties  are  structured.  These  assumptions  and  the  corresponding 
modifications  to  the  system  dynamics  are  stated  below. 


1.3.2  Uncertainty  modeling 

To  model  an  external  disturbance,  we  assume  that  the  effect  of  the  disturbance  on  the  sys¬ 
tem  dynamics  is  structurally  additive.  We  alter  the  dynamics  in  (1.1)  and  add  a  disturbance 
term  w  £  W  with  an  associated  coefficient  to  form  the  system 


x{t)  =  f(x)  +  g1(x)w(t)  +  g2(x)u(t) 
z(t)  =  [h(x),  u(t)]T 

x(t0)  =  x0,  (1.2) 


where  f(x),  fji(x)  and  g2(x)  are  the  new  system  matrices.  Consider  the  space  C2[ 0,  oo)  for 
all  piecewise-continuous  inputs  defined  on  [0,  oo)  satisfying 


|u(t)||2dt  <  oo. 


(1.3) 


The  nonnegative  number 

IMh  =  \\v(t)\\2d?j  (1.4) 

is  the  C2-norm  of  v.  Suppose  the  disturbance  term  w  is  a  function  in  £2[0,  00).  We  use  the 
concept  of  C2-gain  and  ,H.O0 -norm  to  bound  the  response  of  the  system  under  disturbance. 
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Definition  1.1.  (£2-gain).  Given  an  external  disturbance  w  G  W  and  the  corresponding 
output  z(t)  G  Z,  the  input-output  relationship 

f  ||z(t)||2  dt  <  72  f  \\w(t)\\2  dt  +  k(x0),  Vtf>t0,  (1.5) 

Jto  Jto 

is  bounded  for  any  tf  >  0,  initial  states  xq  ,  a  scalar  7,  and  some  bounded  function  n  such 
that  k(0)  =  0.  We  define  the  C,2-gain  from  w  to  2  to  be  the  ratio  between  the  £2-norm  of 
the  output  and  the  £2-norm  of  the  input,  which  is  bounded  by  7  according  to  (1.5). 

In  other  words,  for  any  disturbance  w  G  £2[0,  00),  the  response  of  the  system  from  the 
initial  states  xq  is  defined  for  all  tf  >  0,  which  produces  the  output  z  which  is  also  a 
function  in  £2[0,  00). 

Definition  1.2.  (T~L  oo-norm).  The  'H00-norm  of  the  system  is  the  maximum  gain  of  the 
system  for  all  £2-bounded  disturbances,  that  is,  the  £2-gain  of  the  system  from  tv  to  /“  is 
the  induced- norm  from  £2  to  £2: 

=  sup  ,2 ,  x(t0)  =  0,  (1.6) 

o^wgC2(o,oo)  IFRJ||2 

where  for  any  v  :  [G,  tf], 

/tf  m 

J2\vi(t)\2  dt-  (!-7) 

J  i= 1 

The  goal  of  disturbance  rejection  control  is  to  render  locally  the  £2-gain  of  the  system 
to  be  less  than  or  equal  to  7  such  that  all  state  trajectories  are  bounded  and  converge  to 
the  equilibrium  point.  Clearly,  this  can  be  alternatively  stated  as  the  minimization  of  the 
"Hoo-norm,  which  is  commonly  referred  to  as  'Hoa-control.  Later  in  this  chapter,  we  will 
formally  state  an  optimization  framework  for  the  'H00-norm  minimization  problem. 

To  model  parametric  uncertainties  in  the  system,  we  assume  that  the  uncertainties 
are  structurally  additive  as  opposed  to  a  more  general  multiplicative  form.  This  is  a 
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restrictive  assumption  done  to  preserve  the  control-affine  form  of  the  system  and  simplify 
derivations  for  the  solutions.  We  will  revisit  this  assumption  in  Chapter  5  and  present 
an  alternative  formulation  which  relaxes  it.  For  now,  we  form  the  system  dynamics  with 
additive  uncertainties  as 

Definition  1.3.  (Uncertain  affine  system  dynamics).  The  dynamics  of  the  system  is  given 
by  a  set  of  first-order  differential  equations 


x(t)  =  f(x)  +  Af(x,9,t)  +g1(x)w(t)+  g2(x)  +  Ag2(x,9,t)  u{t ) 

z(t)  =  [h(x),  u(t)]T 
x(t0)  =  x0, 


(1.8) 


where  x(t)  G  X  is  the  state  vector,  z{t)  G  Z  is  the  output  vector,  u(t)  G  U  is  the  vector  of 
controlled  inputs  to  the  system,  xq  is  a  vector  of  initial  states,  and  the  matrices  f(x),  g(x), 
and  h(x)  are  the  system  matrices  that  are  smooth  vector  fields  not  explicitly  parameterized 
by  time,  along  with  a  set  of  admissible  uncertainties  Af(x,  6,  t )  and  A g2(x,  6,  t ). 

Definition  1.4.  (Admissible  uncertainties).  The  parameter  6  G  0  is  the  uncertain  system 
parameter  which  belongs  to  the  set 


0  =  {6>|0  <9<9U}  (1.9) 

that  is  bounded  above  by  9U.  Parametrized  by  9,  the  functions  Af(x.  9,  t )  and  A g2(x,  9,  t ) 
are  uncertain  functions  belonging  to  the  set  of  admissible  uncertainties ,  defined  by 

A f(x,9,t)  =  H2(x)F(x,  9,  t)Ei(x) 

A g2(x, 9,  t )  =  g2(x)F(x,  9,  t)E2(x),  (1.10) 

for  \\F(x,9,t)\\2<  (3,  bounded  by  the  constant  /3.  Here,  H2,  E\,  and  E2  are  appropriate 
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state-dependent  weight  matrices  for  F(x,6,t). 

Intuitively,  we  assume  that  the  uncertainties  are  bounded  by  a  sphere.  This  is  a  gener¬ 
alization  of  the  admissible  uncertainties  used  in  guaranteed- cost  control  [128]  [33],  referred 
to  as  structured  and  matched.  It  is  useful  for  modeling  uncertainties  in  nonlinear  systems 
since  it  accounts  for  both  external  disturbances  and  parametric  uncertainties.  In  conjunc¬ 
tion  with  the  definition  of  the  1-Loo  norm,  we  can  formulate  an  optimization  problem  to 
address  both  disturbances  and  uncertainties  under  this  form. 

1.3.3  Optimization 

Given  the  system  dynamics  in  (1.8),  we  seek  a  control  input  u  which  stabilizes  the  system 
under  the  assumed  admissible  uncertainties.  To  accomplish  this,  we  formulate  the  following 
optimization  framework. 

Definition  1.5.  (Minimax  objective  function).  The  optimization  objective  take  the  form 


(1.11) 


=  mm  max 

uFlA  wFW 

oeo 


given  a  scalar  7  and  a  time  horizon  tf  >  to-  We  assume  that  x(t)  =  0  is  an  unique 
equilibrium  point  of  the  system  with  u[t)  =  0  and  w(t)  =  0. 

This  optimization  framework  is  called  minimax  in  the  sense  that  we  try  to  minimize  the 
objective  function  under  the  worst  possible  disturbances  or  parameter  variations,  which 
maximizes  the  same  objective  function.  It  is  also  clear  that  (1.11)  can  be  interpreted  as  a 
noncooperative,  two-player,  zero-sum  differential  game  where  one  player  seeks  to  minimize 
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while  the  other  player  seeks  to  maximize 


.^(x,  u,  w,  9) 


=  max  - 


wgw  2 
6>e0 


*WII2-r!IH*)ll2  dt- 


(1.13) 


These  opposing  objectives  are  related  by 


Ji(x,u,w,6)  =  —J2(x,u,w,9). 


(1.14) 


This  adversarial  optimization  problem  is  a  differential  game ,  since  both  players  are  con¬ 
strained  by  the  system  dynamics,  a  set  of  ordinary  differential  equations,  as  well  as  the 
associated  uncertainties.  Furthermore,  the  game  is  non-cooperative  as  each  player  chooses  a 
feasible  strategy  independent  of  the  other  player’s  strategy  and  the  payoff  depends  on  both 
players’  strategies.  The  outcome  is  zero-sum ,  since  one  player’s  gain  is  exactly  balanced 
by  the  other  player’s  loss. 

In  the  context  of  game  theory,  the  equilibrium  strategy  for  this  differential  game  consti¬ 
tutes  a  Nash  equilibrium  [17],  also  called  a  saddle-point,  which  can  be  intuitively  explained 
as  a  pair  of  strategies  secured  against  any  attempt  by  one  player  to  unilaterally  change  his 
strategy.  In  a  continuous-time  setting,  a  pair  of  strategies  (u*,  w*)  forms  a  Nash  equilibrium 


if 


(1.15) 

(1.16) 


Equivalently,  the  saddle-point  condition  indicates  that 


Unlike  static  games,  differential  game  theory  utilizes  the  concept  of  both  open  and  closed- 
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loop  solutions.  In  an  open-loop  setting,  both  players  formulate  their  strategy  at  the  mo¬ 
ment  the  system  starts  to  evolve,  based  on  information  available  at  the  time:  the  system 
dynamics,  the  objective  function,  and  initial  conditions.  This  strategy  cannot  be  changed 
once  the  system  evolves  and  the  saddle-point  solution  is  the  combination  of  strategies  of 
both  players  which  are  secured  against  any  attempt  by  one  player  to  unilaterally  change 
his  strategy.  Closed-loop  strategies  imply  that  the  strategy  can  be  changed  as  the  system 
evolves,  based  on  the  current  state  of  the  system.  Mathematically,  an  open-loop  strategy 
is  the  pair  (u(t),w(t),to)  as  functions  of  time  only,  and  a  closed- loop  strategy  is  the  pair 
(u(x,t),w(x,t))  as  functions  of  both  time  and  state.  We  will  discuss  open-loop  solutions 
in  detail  in  Chapter  4.3. 

One  key  assumption  to  a  differential-game  formulation  to  robust  control  is  the  notation 
of  rationality.  Both  players  in  this  adversarial  optimization  problem  are  assumed  to  act 
rationally  and  independently.  If  this  assumption  is  violated,  then  the  robust  controller  will 
exhibit  sub-optimality  or  conservatism.  An  extensive  discussion  of  this  issue  can  be  found 
in  Section  6.5,  in  the  context  of  experimental  results. 


1.4  Thesis  contributions 

In  this  thesis,  we  aim  to  improve  feedback  control  design  by  explicitly  modeling  sources 
of  uncertainties  and  provide  a  tractable  computational  approach  to  robust  control.  We 
make  three  novel  contributions  to  nonlinear  robust  control  for  the  problem  described  in 
Section  1.3.  Specifically, 

In  Chapter  3,  we  design  global  methods  that  provide  exact  and  approximated  solutions 
to  the  nonlinear  robust  control  problem.  We  derive  necessary  and  sufficient  conditions 
for  optimality  in  terms  of  a  Hamilton-Jacobi-Isaacs  equation,  which  forms  the  basis  of  a 
two-play  differential  game.  A  value  function  is  given  as  the  solution  of  the  HJI,  from  which 
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optimal  feedback  control  laws  can  be  derived  for  both  players.  We  interpret  these  feedback 
control  laws,  in  the  context  of  strategies  for  opposing  players  and  the  value  function.  For 
practical  solutions  of  the  HJI,  we  present  a  value  function  approximation  approach,  which 
relies  on  Fourier  basis  functions  in  connection  with  Galerkin’s  method. 

In  Chapter  4,  we  extend  direct  trajectory  optimization  to  nonlinear  robust  control, 
where  open- loop  trajectories  are  obtained  in  accordance  with  necessary  conditions  for  op¬ 
timality.  The  proposed  framework  transforms  a  minimax  optimization  problem  into  a  min¬ 
imization  problem  with  complementarity  conditions.  The  resulting  problem  is  solved  by 
trajectory  optimization,  which  transcribes  a  continuous-time  robust  control  problem  into 
an  equivalent  discrete  form  by  parameterizing  the  state  and  control  spaces  using  global 
polynomials  and  collocating  the  differential-algebraic  equations  using  nodes  obtained  from 
a  Gaussian  quadrature.  We  apply  the  open-loop  trajectories  in  a  receding-horizon  control 
setting. 

In  Chapter  5,  we  develop  a  heuristic  approach  to  robust  control  design  based  on  the 
concept  of  multiple  models,  which  evaluates  the  performance  of  a  given  control  law  over 
a  distribution  of  possible  system  dynamics.  A  distribution  of  models  allow  the  optimiza¬ 
tion  to  consider  the  expected  cost  over  a  range  of  possible  outcomes  and  account  for  the 
maximum  cost  for  the  worst-case  scenario.  We  improve  on  the  approaches  presented  in 
Chapter  3  and  Chapter  4  by  removing  restrictive  assumptions  made  on  the  forms  of  uncer¬ 
tainties,  which  enable  the  approach  to  compensate  for  effects  such  as  unmodeled  dynamics. 
We  develop  these  results  in  the  context  of  necessary  and  sufficient  conditions  for  individ¬ 
ual  value  functions  similar  to  Chapter  3  and  utilize  the  robust  trajectory  optimization 
approach  in  Chapter  4  to  compute  open- loop  trajectories  that  simultaneously  satisfied  the 
optimality  conditions  for  multiple  models.  We  demonstrate  the  use  of  these  trajectories  in 
a  receding-horizon  setting. 

These  contributions  are  united  with  the  framework  of  spectral  approximation,  which  we 
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introduce  in  Chapter  2.  Through  the  use  of  a  least-squares  approximation  framework,  we 
introduce  polynomial  approximation  which  generates  polynomials  that  exhibit  properties  of 
stability  and  convergence,  in  the  form  of  a  special  class  of  polynomials  known  as  orthogonal 
polynomials.  In  each  chapter,  we  use  a  two-dimensional  uncertain  system  as  a  benchmark 
problem  to  evaluate  the  proposed  approach.  In  Chapter  6,  we  give  simulation  results  for 
the  three  approaches  on  large  scale  systems  and  address  the  scalability  of  the  proposed 
approaches  in  terms  of  dimensionality. 

Compared  to  existing  literature  in  robust  control,  this  work  is  distinct  in  several  as¬ 
pects.  First,  we  remove  the  restrictive  assumptions  on  the  forms  of  uncertainties  that  can 
be  modeled  and  provide  a  general  framework  from  which  parametric  system  uncertainties, 
external  disturbances,  and  unmodeled  dynamics  can  be  simultaneously  addressed  in  a  con¬ 
sistent  optimization  framework.  Second,  we  provide  a  spectral  approximation  framework 
from  which  both  global  and  local  solutions  can  be  obtained.  Finally,  we  give  computa¬ 
tional  techniques  that  significantly  improve  the  scalability  nonlinear  robust  control  designs 
in  comparison  to  previous  work  and  provide  solutions  to  high-dimensional  problems  that 
were  previously  intractable. 
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2 

Background:  Spectral  Approximations 


2.1  Motivation 

A  central  idea  in  dynamic  programming,  reinforcement  learning,  and  trajectory  optimiza¬ 
tion  is  function  approximation,  the  use  of  a  combination  of  basis  functions  from  a  well- 
defined  class  that  closely  matches  a  given  continuous  function.  In  this  chapter,  we  review 
a  class  of  techniques  in  approximation  theory  and  numerical  analysis  known  as  spectral 
function  approximation.  We  do  so  with  a  focus  on  approximations  in  a  least-squares  sense , 
which  generates  polynomials  that  exhibit  properties  of  stability  and  convergence ,  in  the 
form  of  a  special  class  of  polynomials  known  as  orthogonal  polynomials.  These  function  ap¬ 
proximation  techniques  will  be  used  in  the  subsequent  chapters,  in  both  single-dimensional 
(trajectory)  and  multi-dimensional  (value  function)  settings.  For  a  comprehensive  overview 
of  function  approximation  using  spectral  methods,  we  refer  the  readers  to  Askey  [100], 
Quarteroni  [101],  Boyd  [129],  and  Shen  [130]. 

It  is  well  known  that  any  continuous  function  f(x)  can  be  represented  in  terms  of  n 
basis  functions 

(0o(®),0i(®),---,0n(®)}  (2-1) 

and  a  finite-dimensional  approximation  of  f(x)  can  be  done  using  a  linear  combination  of 
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these  basis  functions  in  the  form 

n 

hx)  =  ^2ciMx)>  (2-2) 

i=i 

where  the  variables  c*  are  coefficients  associated  with  the  basis  functions  (f>i{x).  In  this 
form,  the  monomial  basis 

4>k(x)  =  xk  (2.3) 

gives  a  polynomial  approximation  of  f(x).  The  Weierstrass  approximation  theorem  states 
that  for  every  continuous  function  dehned  on  an  internal  [a,  b\,  there  exists  a  polynomial 
Pn(x)  of  degree  n  such  that 

lim  (  max \f(x)  —  Pn(x)\ )  =0.  (2.4) 

n—> oo  \a<x<b  J 

In  other  words,  f(x)  can  be  uniformly  approximated  by  Pn(x)  to  arbitrary  accuracy  when 
n  is  sufficiently  large.  While  the  Weierstrass  theorem  opens  up  the  possibility  of  using 
polynomials  for  the  purposes  of  approximation,  it  does  not  specify  the  form  of  the  poly¬ 
nomial.  In  particular,  it  does  not  specify  constraints  on  the  basis  function  or  locations  at 
which  the  original  function  f(x)  is  to  be  sampled.  As  it  turns  out,  certain  choices  of  basis 
functions  and  sampling  schemes  can  lead  to  instability.  As  demonstrated  by  the  so-called 
Runge ’s  phenomenon ,  when  a  function  is  sampled  and  interpolated  at  equidistant  nodes 

2  i 

Xi  = - 1,  *  G  {0,1,..., n},  (2.5) 

n 

the  error  of  the  approximation 

lim  (  max  \f(x)  —  Pn(x)\)  =  oo  (2.6) 

n— oo  V  —  1  <x<  1  / 
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(a)  Blue:  Runge  function.  Green:  5th-order  poly¬ 
nomial  approximation.  Red:  9th-order  polynomial 
approximation. 


(b)  Approximation  error  for  5th,  7th,  11th, 
and  15th-order  polynomial  approximation  using 
equidistant  nodes. 


Figure  2.1:  Runge’s  phenomenon  for  the  function  f(x) 


i 

l+25x2  ' 


grows  without  bound  as  the  degree  n  of  the  polynomial  increases.  Fig.  2.1(a)  demonstrates 
Runge’s  phenomenon  for  the  function 


/p) 


i 

1  +  25a;2  ’ 


(2.7) 


where  5th  and  9th-order  polynomial  approximations  are  constructed  using  equidistant 
nodes.  While  a  higher-order  approximation  is  more  accurate  in  the  center,  the  errors  are 
higher  in  the  boundaries  compared  to  a  lower-order  approximation.  Fig.  2.1(b)  shows  the 
approximation  error  for  orders  n  =  5,7,11,15  where  the  errors  continue  to  accumulate 
despite  increasing  the  approximation  order. 

Runge’s  phenomenon  shows  that  when  attempting  to  approximate  a  given  function, 
the  selection  of  nodes  and  the  choice  of  basis  functions  play  important  roles  in  the  quality 
of  the  approximation.  A  stable  approximation  indicates  that  the  approximation  does  not 
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exhibit  Runge’s  phenomenon,  while  the  rate  of  convergence  indicates  the  rate  in  which  the 
approximation  error  is  decreased  when  the  degree  of  the  polynomial  is  increased. 


2.2  Least-squares  approximation 

We  pursue  a  well-formed  polynomial  approximation  in  the  least-squares  sense ,  as  typi¬ 
cally,  there  are  more  data  (interpolation  nodes)  than  the  unknowns  (coefficients  of  basis 
functions),  creating  an  overdetermined  system.  In  this  formulation,  the  coefficients  are 
obtained  by  evaluating  a  candidate  polynomial  and  minimizing  the  sum  of  the  squares  of 
the  errors  made  at  the  interpolation  nodes. 

Consider  the  weighted  £2-n orm  of  a  function  f(x) 

\\fh,w=  (^  (f{x))2w(x)  dx^j  .  (2.8) 

The  least-squares  approximation  problem  is  formulated  to  determine  the  polynomial  that 
is  closest  to  f(x)  such  that 

II  f{x)  ~  /*(a0||2,ti>=  min  ||  f(x)  -  f(x)\\2,w  (2.9) 

/O) 

In  other  words,  the  weighted  £2-norm  of  the  approximation  errors  are  minimized.  In  terms 
of  the  basis  functions  in  (2.2),  this  is  the  problem  of  determining  coefficients  c  such  that 

c*  =  argmin  ||  f(x)  -  f(x) \\2,w,  (2.10) 

C 

which  gives  the  orthogonal  projection  of  f(x)  onto  the  hnite-dimensional  basis.  Instead  of 
minimizing  the  the  £2-norm  of  the  difference,  we  can  minimize  its  square.  Let  E  denote 
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the  square  of  the  weighted  £2-distance  between  f{x)  and  f(x) 


E(c0,...,cn)=  w (x)  ( / (x)  -  f(x) )  dx, 


(2-11) 


which  is  a  quadratic  function  of  the  coefficients.  A  necessary  condition  for  optimality  for 
this  unconstrained  minimization  problem  is 


dE 

dcj 


=  0. 

c=c* 


By  taking  the  partial  derivative  and  simplifying,  we  obtain 


(2.12) 


W[X 


)(/)j(x)f(x)  dx  +  2  c*  I  w(x)(/)i(x)</)j(x)  dx  —  0,  (2.13) 


i= o 


which  implies  that 


n  nb  rb 

c*  /  w(x)(f>i(x)<f)j(x)  dx  =  /  w(x)4>j(x)f(x)dx,  j  —  0,...,n.  (2-14) 

• — pi  J  a  J  a 


It  is  clear  that  if  the  integral  term  on  the  left-hand  side  of  (2.14)  evaluates  to  a  scalar, 
then  the  problem  is  simplified  considerably  and  we  can  obtain  a  closed-form  expression  for 
c*  in  terms  of  other  variables.  In  the  simplest  case,  this  can  be  done  using  basis  functions 
4>i(x)  and  (f).j(x)  associated  with  orthonormal  polynomials ,  defined  on  the  inner  product 
with  respect  to  a  weight  w(x) 


f 


<f>i(x)(f)j(x)w(x)  dx  =  8, 


*7 


1  i  =  j 

0  i  ^  j  ’ 


(2.15) 


where  V  is  the  Kronecker  delta  function.  When  the  basis  function  satisfy  this  orthogonality 
property,  the  integral  on  the  left-hand  side  of  (2.14)  evaluates  to  1  and  can  be  removed 
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entirely,  resulting  in  the  coefficients 


rb 

c*  =  /  <pi(x)f(x)'w(x)dx,  i  =  0,...,n. 


(2-16) 


For  more  general  problems,  with  polynomials  that  satisfy 

fb 

/  <; bi{x)4>j{x)w{x )  dx  =  Sij  = 

J  a 

with  f^((pi(x))2w(x)  dx  that  does  not  necessarily  evaluates  to  1,  the  basis  functions  form 
orthogonal  polynomials.  Consequently,  the  solution  to  the  least  squares  problem  is  given 
by  the  coefficients 

c*  =  fgMtffWwWfa 

f^((j)i(x))2w(x)  dx 

In  this  formulation,  to  approximate  a  given  function  f(x)  involves  selecting  a  basis 
function  (f>{x)  and  computing  the  coefficients  c.  In  the  subsequent  sections,  we  will  de¬ 
scribe  different  choices  for  orthogonal  polynomials  and  a  way  of  computing  the  (2.18) 
using  Gaussian  quadrature. 


2  =  0,. 


:  n. 


(2.18) 


£((f)i(x))2'w(x)dx  i=j 

0  i^j 


(2.17) 


2.3  Orthogonal  polynomials 


Consider  the  weighted  Sobolev  space 


£%(a,b)  =  <  f  :  /  f2(x)w(x )  dx  <  +oo 


(2.19) 
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defined  on  an  interval  (a,  b)  with  the  weight  function  w(x).  An  inner  product  for  functions 
/  and  g  defined  on  this  space  takes  the  form 

(f,9)w=  [  f(x)g(x)w(x)dx.  (2.20) 

J  a 

We  note  that  the  the  /Vnorm  can  be  expressed  using  the  inner  product 

ll/lk=  (/./)!  (2.21) 

and 

(f,g)  2  =  o 

if  /  and  g  are  orthogonal  to  each  other.  A  polynomial 

Pn(x)  =  xn  +  a™_1xn_1  H - f 

with  degree  n  is  orthogonal  in  £^(o,  b )  if 

C Pi,Pj)w  =  0,  for  i  ^  j.  (2.24) 

For  any  given  positive  weight  function  w(x),  there  exists  a  unique  set  of  orthogonal  poly¬ 
nomials  that  can  be  constructed  as  a  three-term,  recurrence  relation 

Po  =  1 
Pi  =  x  — 

Pn  =  (x  ~  an+i)pn  ~  Pn+iPn-t,  n  >  1,  (2.25) 


(2.22) 


(2.23) 
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where  the  coefficients  a  and  /3  are  given  by 


Ot\ 


w{x)xdx  /  w(x) dx 


C++ 1 


(2.26) 


All  orthogonal  polynomials  can  be  constructed  using  these  three-term  recurrence  relations, 
and  among  the  most  popular  classical  orthogonal  polynomials  is  the  Jacobi  family  of  poly¬ 
nomials,  defined  with  the  weight 


w(x)  —  (1  —  x)a(l  +  x)P  for  a,  (3  >  —1,  (a,b)  =  (—1,1).  (2.27) 


By  definition,  Jacobi  polynomials  are  orthogonal  since  the  choice  of  weight  function  pro¬ 
duces 

J  Pn(x)Pm(x)(l  ~  x)a(^  +  XY  dx  =  0,  for  n  ^  m.  (2.28) 

Legendre  and  Chebyshev  polynomials  belong  to  the  Jacobi  family  and  can  be  generated  by 
varying  the  parameters  a  and  beta.  For  a  =  (3  =  0,  we  form  the  Legendre  polynomials. 
For  a  =  (3  =  —  we  get  the  Chebyshev  polynomials.  For  now,  we  focus  mainly  on  the 
Legendre  polynomials. 

On  an  interval  (a,  b )  =  (—1, 1)  with  w(x)  =  1,  the  Legendre  polynomial  is  defined  as 

Lq{x)  =  1 
Li(x)  =  x 

3-jn+ 1 (*^)  !  7~xLn{x')  ;  yLn_i(^c),  n  >  1.  (2.29) 

n+1  n + 1 
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It  is  easy  to  check  that  Legendre  polynomials  are  orthogonal  in  [—1, 1]  since 

J  ^  Li{x)Lj(x)dx  =  (2.30) 

The  Legendre  polynomial  can  also  be  defined  in  terms  of  its  derivatives  L'  such  that 

Ln(x)  =  ^T[(Ln+  l(X)  -  Ln-l(X))’  n  >  h  (2-31) 

where  the  first  and  second  derivatives  are  given  by 

n—  1 

L'n(x)  —  k  +  l)Lfc(x),  A:  +  nisodd 
k= o 

L"(x)  =  (k  +  i)(n(n  +  1)  -  k(k  +  1  ))Lk(x).  (2.32) 

A  sequence  (L'(x)}  is  mutually  orthogonal  with  respect  to  w(x)  =  1  —  x2  since 

L'^)L'j(X)(  1  -  i2)  dx  =  (2.33) 

By  using  Legendre  polynomials,  we  have  fixed  the  form  of  the  basis  functions  in  (2.2). 
What  remains  is  the  computation  of  corresponding  coefficients  in  the  form  of  (2.18),  which 
we  will  describe  in  the  next  section. 
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2.4  Gaussian  quadrature 

Previously  in  (2.18),  we  have  shown  that  the  coefficients  in  a  least-squares  polynomial 
approximation  are  given  by 


,*  _  Ja4>i(X)f(X )w(x)dx 


(2.34) 


With  the  choice  of  orthogonal  polynomials,  the  weight  function  w(x)  and  basis  functions 
<pi  are  fixed.  The  evaluation  of  expression  now  depends  on  the  computation  of  definite 
integrals,  subject  to  the  selection  of  samples  for  the  function  f(x).  We  approach  the 
integration  problem  numerically  and  show  that  the  choice  of  interpolation  nodes  can  in 
fact  be  given  in  association  with  numerical  quadratures. 

Weighted  quadratures  are  numerical  integration  schemes  of  the  form 

/I  n 

f(x)w(x )  dx  &  ^  Aif(xi )  (2.35) 

1  i= o 

over  an  the  interval  [—1,1]  where  w(x)  >  0  is  a  weight  function.  Quadratures  of  this  form 
applies  generally  to  intervals  [a,  b]  by  a  simple  linear  transformation.  Let 


(2.36) 


Then 


(2.37) 


which  implies  that 


(2.38) 
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For  brevity,  we  will  use  the  form 


f(x)w(x)  dx 


n 

E 

i= 0 


Af(xi) 


(2.39) 


in  the  subsequent  derivations  and  assume  that  the  transformation  has  been  done  implicitly. 

Numerical  quadratures  operate  by  sampling  the  given  function  f{x)  at  selected  nodes 
(sometimes  referred  to  as  abscissas )  and  quantifying  the  order  of  the  quadrature  by  ex¬ 
amining  the  highest  degree  polynomial  in  the  integrand  for  which  the  quadrature  is  exact. 
For  example,  three-point  and  five-point  centered  difference  formulas  are  exact  when  ap¬ 
plied  to  second-degree  and  fourth-degree  polynomials,  respectively.  If  the  choice  of  nodes 
xo,  xi, . . . ,  xn  are  made  a  priori,  for  example,  with  equidistant  nodes,  then  resulting  scheme 
(known  as  Newton-Cotes)  has  the  coefficient 


A  = 


rw*  -  a) 

YlifriXi  -  Xj) 


dx, 


(2.40) 


which  can  be  shown  to  be  exact  if  the  integrand  is  a  polynomial  of  degree  at  most  n. 

In  this  approach,  the  choice  of  nodes  is  fixed  with  n  +  1  number  of  unknown  coefficients 
Ai,  which  are  explicitly  dependent  on  the  nodes.  As  observed  by  Gauss,  if  we  can  deter¬ 
mine  the  location  of  the  nodes  (for  example,  in  a  non-equidistant  fashion)  in  conjunction 
with  coefficients,  then  we  have  twice  as  many  parameters  to  choose  from  to  maximize  the 
accuracy  of  the  quadrature.  With  2n  +  2  degrees  of  freedom,  the  result  is  a  higher  degree 
of  accuracy  for  the  quadrature  without  needing  to  increase  the  number  nodes,  in  a  scheme 
known  as  Gaussian  integration. 

Consider  the  polynomial  f(x)  of  degree  2n  +  1  in  the  form 


f(x)  =  q(x)p(x)  +  r(x), 


(2.41) 
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where  polynomials  p(x),  q(x),  and  r(x)  are  all  of  degree  n.  p(x)  and  q{x)  are  orthogonal 
with  respect  to  a  weight  function  w(x),  i.e., 

p(x)q(x)w(x)  dx  =  0.  (2.42) 

If  x0,  xi, . . . ,  xn  are  the  roots  of  q(x),  then 

f(xi )  =  r(xi),  (2.43) 


and  the  quadrature  takes  the  form 


nb  rb 

/  f(x)w(x)dx=  /  [q(x)p(x)  +  r{x)\  w(x)  dx. 

'  a  J  a 


(2.44) 


Since  q(x)  and  p(x)  are  assumed  to  be  orthogonal,  we  can  simplify  the  integral  to  be 

nb  rb  n  n 

/  f(x)w{x)dx=  /  r(x)w(x)  dx  =  ^^Air(xi)  =  '^^Aif(yXi).  (2-45) 

J  a  J a  i— n  i — n 


In  other  words,  we  have  demonstrated  that  by  selecting  the  roots  of  an  orthogonal  poly¬ 
nomial,  we  can  obtain  a  quadrature  to  accurately  integrate  functions  of  degrees  at  most 
2n  +  1.  This  Gaussian  quadrature  is  optimal  in  the  sense  that  it  is  impossible  to  pick 
{xj,  w:) }('=0  for  a  quadrature  that  is  accurate  for  f{x)  of  degree  2 n  +  2. 

All  orthogonal  polynomials  can  be  shown  to  yield  roots  Xj  that  are  real,  simple  (of 
multiplicity  one),  and  are  in  the  interval  (a,  b)  for  which  the  polynomial  is  defined  on.  In 
particular,  since  Legendre  polynomials  are  solutions  to  Legendre’s  differential  equation 

+  n(n  +  l)Pn(x)  =  0,  (2.46) 

Picard’s  existence  theorem  guarantees  that  the  all  roots  are  distinct  and  real.  This  is  a 


d 

dx 


(1  -  x2)-^Pn(x) 
dx 
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useful  property,  as  the  use  of  Legendre  polynomials  with  Gaussian  nodes  results  in  the 
Gauss- Legendre  quadrature ,  with  the  weight  function 

2 

(l-*?)[^nW 

The  location  of  interpolation  nodes  are  given  by  the  roots  of  a  n-degrec  Legendre  poly¬ 
nomial.  Table  2.1  gives  node  locations  and  weights  for  Legendre  polynomials  of  degree  at 
most  n  =  5. 


Table  2.1:  Node  location  and  weights  for  Gauss-Legendre  quadratures 


Number  of  nodes  n 

Node  location  x  i 

Weights  w_i 

i 

0 

2 

2 

±\A73 

1 

3 

0 

iv/3/5 

8 

9 

5 

9 

A 

±\/(3  -  2\/67s)/7 

(18+V30) 

36 

L ± 

±(/(3  +  2^6751/7 

(18— \/30) 

36 

0 

128 

225 

5 

±3  A5-  V!0/7) 

332+12\/80 

900 

±i(/(5  +  2^10/7) 

332-12780 

900 

For  the  problem  of  approximating  the  Runge  function  (2.7),  Fig.  2.2  compares  Gauss 
nodes  with  traditional  equidistant  nodes.  Fig.  2.2(a)  shows  the  result  of  the  two  approx¬ 
imation  schemes  compared  to  the  true  Runge  function,  where  square  and  circle  markers 
indicates  the  location  of  the  respective  nodes.  Fig.  2.2(b)  plots  the  error  of  the  Gauss 
nodes  for  polynomials  of  increasing  complexity.  Compared  to  the  error  plot  in  Fig.  2.1(b), 
we  see  that  the  choice  of  Gauss  nodes  results  in  a  convergent  approximation  which  avoids 
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(a)  Blue:  Runge  function.  Green:  approximation  (b)  Approximation  error  for  5th,  7th,  11th,  15th, 

with  equidistant  nodes.  Red:  approximation  with  30th,  60th,  and  120th-order  polynomial  approxi- 
Gauss  nodes.  Circle:  location  of  equidistant  nodes.  mation  using  Gauss  nodes. 

Square:  location  of  Gauss  nodes. 


Figure  2.2:  Approximations  for  the  function  f(x)  =  1+^5a,2 


Runge’s  phenomenon. 

The  quadratures  we  have  discussed  thus  far  do  not  include  nodes  at  the  end  points 
±1  of  the  integration  interval.  For  certain  applications,  such  as  boundary  value  problems 
discussed  in  Chapter  4,  end  points  must  be  explicitly  included  to  enforce  Dirichlet  boundary 
conditions.  Simple  variations  on  the  Gauss-Legendre  quadrature  include  the  Gauss-Radau 
and  Gauss-Lobatto  quadratures.  In  Gauss-Radau,  either  (but  only  one)  endpoint  can  be 
included  as  a  node  in  the  approximiation,  which  leaves  2n  —  1  remaining  parameters  with 
an  accuracy  of  2 n  —  2.  In  Gauss-Lobatto,  both  endpoints  are  included  as  nodes,  which 
results  in  an  accuracy  of  2n  —  3.  Gauss-Lobatto  quadratures  take  the  form 

/i  2  n~l 

f(x)w(x )  dx  «  — - —  (/(  1)  +  /(- 1))  +  V  Wif(xi )  +  Rn,  (2.48) 

-l  nKn  ~  1) 
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where  the  weight  wt  is  given  by 


2 

n(n  -  ^(Pn-^Xi))2' 


Xi  ^  ±1, 


(2.49) 


and  the  remainder  Rn  is 


-n(n-l)322"  1[(n-2)!]4  2n_2  ^  ^  i 

(2n  —  l)[(2n  —  2)!  ]3  ’  K)’  _1  <  {  <  1 


(2.50) 


The  interpolation  nodes  are  given  by  the  roots  of  P'^-^x).  Table  2.2  gives  node  locations 
and  weights  for  Gauss-Lobatto  quadratures  of  degree  at  most  n  =  5.  In  terms  of  con- 


Table  2.2:  Node  location  and  weights  for  Gauss-Lobatto  quadratures 


Number  of  nodes  n 

Node  location  x_i 

Weights  w_i 

3 

0 

4 

3 

±1 

1 

3 

4 

5 

6 

±1 

1 

6 

0 

32 

45 

3 

V 7 

49 

90 

±i 

1 

10 

vergence,  there  are  no  differences  between  the  choice  of  Gauss,  Gauss-Radau,  and  Gauss- 
Lobatto  nodes.  For  general  orthogonal  polynomials,  is  rate  of  convergence  is  exponential 
in  terms  of  the  degrees  of  the  approximation 


ll/W  e 


(2.51) 


In  other  words,  for  a  sufficiently  smooth  function,  as  the  degree  n  of  the  polynomial  ap¬ 
proximation  f*  is  increased,  there  is  an  exponential  decrease  in  the  corresponding  error. 
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This  a  property  known  as  spectral  accuracy ,  an  unique  property  not  enjoyed  by  other  forms 
of  function  approximations,  such  as  splines,  RBfs,  and  wavelets  [129]. 

2.5  Fourier  approximation 

For  the  approximation  of  periodic  functions,  the  orthogonal  polynomials  described  in  the 
previous  section  is  not  appropriate  since  the  polynomials  are  not  periodic  by  definition. 
Instead,  we  describe  the  use  of  Fourier  basis  functions  to  achieve  similar  spectral  approx¬ 
imation  of  periodic  functions  and  we  demonstrate  that  these  trigonometric  polynomials 
satisfies  the  orthogonality  conditions  similar  to  orthogonal  polynomials. 

Consider  a  periodic  function  f(x)  on  the  interval  [0,  2ti\.  Periodicity  indicates  that  f(x) 


satisfies 


f(x  +  2t r)  =  f(x). 


(2.52) 


We  use  C2(0.  2tt)  to  denote  the  space  of  complex- value  functions  that  are  square  integrable 
over  [0,  27t],  that  is, 


such  that 


(2.53) 


The  inner  product  of  functions  /  and  g  is  defined  as 


(f,g)=  /  f(x)g(x)dx 


(2.54) 


0 


and  the  norm  is 


\\f\\ci^=  VUj)- 


(2.55) 
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Fourier  trigonometric  polynomials  are  defined  using  the  basis 


<j>k{x)  =  elkx  =  cos(kx)  +  ism(kx),  k  =  0,  ±1,±2, ...  (2.56) 


It  is  clear  that  the  Fourier  basis  (2.56)  is  orthogonal  since 


r2n  r2n  .  i  i  —  j 

{(f>j,(l>k)=  /  <f>j(x)<j)k(x)  dx  =  /  el^~k)x  dx  =  27r<%  =  <  ,  (2.57) 

do  do  I  0  i  7^  j 


where  5ij  is  the  Kronecker  delta  function.  The  polynomial  is  then  the  linear  combination 
of  the  basis  in  the  form 

OO 

F=  (2-58) 

k=— oo 

where  the  coefficient  is 

1  r2'K  1 

A  =  2 -I  f(x)e-‘bdx=-(f,^).  (2.59) 

For  the  choices 

1  f27r 

dk  =  —  /  (a(x)cos(kx)  +  P(x)  sin(kx))  dx 

2“  do 

1  f2n 

bk  =  —  /  (— a(x)  cos(kx)  +  /3(x)  sin(fcr))  dx,  (2.60) 

2tt  do 

the  Fourier  coefficients  can  be  written  as 


fk  =  dk  +  ibk ,  k  =  0,  ±1,±2, ...  (2.61) 


It  is  also  clear  that  for  real  valued  functions, 


f-k  =  fk  Vfc.  (2.62) 
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Following  the  least  squares  function  approximation  described  in  (2.9),  we  have 


II  f(x)  ~  /iv(^)IU2(o,27r)=  min  \\f(x)  -  f{x)\\C2(0,2n) 


f(x) 


(2.63) 


where  f^(x)  is  a  hnite  truncation  of  order  N,  which  gives  the  approximation 


N-l 

2 


/w~  E  ftN)e‘k’- 

u—  N-l 


(2.64) 


The  coefficients  are  given  by 


2m  jk 


N 


(2.65) 


Similar  to  orthogonal  polynomials,  the  spectral  accuracy  with  respect  to  the  number  of 
nodes  N  is 


II  f(x)  -/(t>IU2(0,2^)~  e  N 


(2.66) 


which  is  an  exponential  convergence. 

2.6  Summary 

In  this  chapter,  we  presented  a  consistent  framework  in  which  continuous-time  functions 
can  be  approximated  as  linear  combinations  of  basis  functions.  This  approximation  is  done 
in  the  least-squares  sense,  which  evaluates  a  candidate  polynomial  and  minimizes  the  sum 
of  the  squares  of  the  errors  made  at  the  interpolation  nodes.  We’ve  determined  that  by 
using  orthogonal  polynomials  as  basis  functions  and  sampling  at  Gauss  nodes,  the  resulting 
approximation  avoids  Runge’s  phenomenon  and  exhibits  spectral  accuracy. 
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The  function  approximation  framework  we  have  presented  is  done  in  the  context  of 
interpolation,  where  a  known  function  f(x)  is  given  (as  least  in  terms  of  samples).  In  the 
context  of  control  problems,  the  function  that  we  wish  to  approximate  is  the  result  of  a 
constrained  optimization  problem  and  is  unknown.  While  we  can  still  select  the  form  of 
basis  functions  based  on  the  orthogonality  conditions,  the  coefficients  for  the  basis  function 
are  computed  through  optimization  instead  of  quadratures.  This  is  done  by  enforcing 
system  dynamics  and  associated  constraints  at  collocation  points,  which  are  interpolation 
nodes  described  in  Section  2.4.  We  will  address  these  issues  in  detail  in  the  subsequent 
chapters. 
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Global  Solutions 


3.1  Motivation 

In  Section  1.3,  we  described  a  'H0o  robust  control  problem  as  a  two-player  differential 
game  with  parametric  uncertainties  in  the  system  dynamics.  The  two  players  formulate 
respective  strategies  in  a  non-cooperative  and  zero-sum  fashion,  which  results  in  a  Nash 
equilibrium  as  a  saddle-point  strategy.  In  this  chapter,  we  establish  necessary  and  sufficient 
conditions  for  optimality  and  formulate  global  solutions  to  this  differential  game. 

Global  solutions  are  difficult  to  obtain  in  nonlinear  optimization  problems  in  general.  In 
optimal  control,  the  application  of  the  dynamic  programming  principle  in  a  continuous-time 
problem  with  nonlinear  dynamics  results  in  the  Hamilton-Jacobi-Bellman  (HJB)  equation, 
a  nonlinear  first-order  partial  differential  equation  that  is  the  necessary  and  sufficient  con¬ 
dition  for  global  optimality.  For  nonlinear  robust  control  problems  in  the  Hoc- style,  the 
dual  to  the  HJB  is  the  Hamilton-Jacobi-Isaacs  (HJI)  equation,  for  which  the  solution  is  the 
Bellman  value  function.  As  reviewed  in  Section  1.2.2,  these  partial  differential  equations 
are  extremely  difficult  to  solve. 

In  this  chapter,  we  characterize  global  solutions  to  the  differential  game  formulation  laid 
out  in  Section  1.3  from  a  value  function  approximation  perspective.  We  reviewed  related 
work  in  Section  1.2.3  and  Section  1.2.4.  We  derive  necessary  and  sufficient  conditions 
for  optimality  for  the  Hoo  problem  in  Section  1.3  in  terms  of  the  Isaacs  equation,  then 
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approximate  the  value  function  as  a  linear  combination  of  basis  functions.  We  build  upon 
the  spectral  approximation  framework  in  Chapter  2,  which  suggests  basis  functions  in  terms 
of  orthogonal  polynomials,  resulting  in  an  approximation  that  exhibits  spectral  accuracy. 

The  result  of  this  chapter  is  an  characterization  of  the  value  function  from  the  perspec¬ 
tive  of  differential  games,  which  is  critical  in  obtaining  explicit  feedback  control  laws  that 
guarantee  the  asymptotic  stability  of  the  closed-loop  system,  without  requiring  the  explicit 
form  of  the  parametric  uncertainties.  For  the  practical  implementation  of  the  proposed 
approach,  we  demonstrate  the  value  function  approximation  framework  for  a  simple  two- 
dimensional  differential  game,  which  will  be  used  as  a  baseline  comparison  for  the  methods 
in  the  subsequent  chapters. 


3.2  Necessary  and  sufficient  conditions 


In  Section  1.3,  we  formulated  a  robust  Hoo-control  problem  with  the  nonlinear  dynamics 


x(t)  =  f(x)  +  Af(x,e,t)  +  g1(x)w(t)  +  g2(x)  +  Ag2(x,9,t)  u(t ) 
z{t )  =  [h(x),  u(t)]T 

x(t0)  =  x  o,  (3.1) 


with  x(t)  G  X  as  the  state  vector,  z(t)  G  Z  is  the  output  vector,  u(t)  G  U  and  w(t)  G  W 
are  the  minimizing  and  maximizing  players,  respectively,  xo  is  a  vector  of  initial  states ,  and 
the  matrices  f(x),  g(x),  and  h(x)  are  the  system  matrices  that  are  smooth  vector  fields  not 
explicitly  parameterized  by  time.  The  parameter  9  G  0  is  the  uncertain  system  parameter 
which  belongs  to  the  set 

©  =  {0|O  <9<6U}  (3.2) 
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that  is  bounded  above  by  9U.  Parametrized  by  9,  the  functions  A f(x,  9 ,  t)  and  Ag2(x,  0,  t ) 
are  uncertain  functions  belonging  to  the  set  of  admissible  uncertainties ,  defined  by 


A  f(x,9,t)  =  H2(x)F(x,9,t)Ei(x) 

A g2(x,  9 ,  t)  =  g2(x)F(x,  9 ,  t)E2 (x),  (3.3) 


for  \\F(x,9,t)\\2<  ft,  which  intuitively,  assumes  that  the  uncertainties  are  bounded  by  a 
sphere.  Here,  H2,  E\ ,  and  E2  are  appropriate  state-dependent  weight  matrices  for  F(x,  9,  t ). 
The  minimax  objective 

J(x,  u,  w,  0)  =  min  max  - 

ueU  weW  2 

eee 

entails  a  two-player,  noncooperative,  zero-sum  differential  game  over  a  horizon  tf  >  to- 
We  begin  the  derivation  of  the  necessary  and  sufficient  condition  for  optimality  for  this 
differential  game  by  characterizing  its  saddle-point  solution.  This  is  the  Bellman  value 
function 


rtf  r 


'to 


m  ii2-72ik*)ii2 


dt 


(3.4) 


Definition  3.1.  (Bellman  value  function).  The  value  function  V(x,t)  defines  the  cost-to- 
go  from  any  initial  state  x  and  any  time  t  such  that 


V(x,t)  =  inf  sup 
U^t  we w 
eee 


72||tc(r) 


dr. 


(3.5) 


In  other  words,  the  value  function  bounds  the  cost  and  provides  the  optimal  cost-to-go  for 
any  state  x(t),  i.e., 


V(x(tf),tf)  <  J*(x,  u*,  w*,  9)  <  V(x(t0),t0), 


(3.6) 


where  u*(t)  and  w*  (t)  are  the  optimal  controls  of  the  system  ( w*{t )  is  optimal  in  the  sense 
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that  it  provides  the  worst-case  disturbance  to  the  system).  We  use  the  notation 


Vx  (x,t) 


dV(x,t) 
dx(t)  ’ 


Vt{x,t) 


dV{x,t) 

Ft 


(3.7) 


to  denote  the  row  vectors  of  first  partial  derivatives  of  the  value  function  with  respect  to  t 
and  x(t),  respectively.  For  brevity,  we  drop  the  time  index  t  when  it  is  immaterial. 

We  use  basic  optimization  theory  to  derive  the  optimal  control  and  the  associated  neces¬ 
sary  and  sufficient  condition  for  optimality.  First,  we  convert  the  constrained  optimization 
problem  in  (3.4)  to  an  unconstrained  form  by  using  the  Hamiltonian. 

Definition  3.2.  (Hamiltonian).  The  Hamiltonian  H(x,  A,  u,  w,  9)  for  the  system  dynamics 
in  (3.1)  and  objective  function  (3.4)  is 


H(x,Vx,u,w,0)  =  Vx(x)(^f(x)  +  Af(x,9,t)  +  gi(x)w(t) 

+  [92(x)  +  Ag2(x,e,t)]u(t)j  +  ]^\\z{t)\\2-]p2\\w(t)\\2 ,  (3.8) 


where  the  A  is  the  costate  or  adjoint  vector,  which  in  this  case,  is  the  spacial  derivative  of 
the  value  function  Vx. 

The  Hamiltonian  combines  the  right-hand  side  of  the  system  dynamics  in  (3.1)  and  the 
one-step  cost  within  the  integral  in  the  objective  function  (3.4),  therefore  transforming  a 
constrained  optimization  problem  into  an  unconstrained  one.  In  this  unconstrained  form, 
a  necessary  condition  for  optimality  is  Isaacs  ’  condition ,  which  states  that 


inf  sup  H(x,  Vx,  u,  w,  6)  =  sup  inf  H(x,  Vx,  u,  w,  6).  (3.9) 

iieu  uiew 

6>e©  6>e© 

Equivalently,  this  is 


H(x,  Vx,  u*,  w,  6)  <  H(x,Vx,u* ,w* ,9)  <  H(x,Vx,u,w* ,9). 


(3.10) 
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For  the  Hamiltonian  to  be  minimized  over  U ,  the  set  of  all  possible  controls,  while  maxi¬ 
mized  over  W,  the  set  of  all  possible  disturbances,  we  write 

min  sup  H(x,  Vx,  u,  w,  6)  <  0.  (3.11) 

u&A 

6>ee 

Effectively,  since  we  used  Vx[x)  as  the  costate,  the  value  function  is  the  solution  to  this 
minimization  problem.  Now,  we  derive  several  simple  identities  which  will  be  helpful  in 
subsequent  discussion.  First, 

—  F(x,6,t)E1(x)  >0,  (3.12) 

since  by  definition,  a  squared-norm  is  positive  semi-definite.  Also  by  definition,  || F(x,  0,  t)  ||2< 
/?,  and  hence  we  can  expand  the  left-hand  side  as 

I HZ(x)V?(x)-F(x,0,t)E1{x)  2  =  Vx(x)H2(x)H^(x)V^(x) 

—  2Vx(x)H2(x)F(x,d,t)Ei(x)  + /32E^(x)Ei(x)  >0.  (3.13) 

Rearranging  the  terms,  we  arrive  at 

Vx(x)H2(x)F(x,  9,  t)Ei(x)  <^[vx(x)H2(x)Hj(x)V^(x)  +  /32E^(x)E1(x)'\.  (3.14) 

Likewise,  we  can  establish  that 

/  -  ]-F(x,9,t)E2(x)  =  I  -  F(x,0,t)E2(x)  +  /32 E% (x) E2(x)  >  0,  (3.15) 

Z  t: 

where  /  is  the  identity  matrix.  This  implies 

F(x,6,t)E2(x)  <  /  +  i/32 E% (x) E2(x) .  (3.16) 
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Going  back  to  (3.11),  we  resolve  the  inner  maximization  problem  first  by  taking  the  supre- 
mum  of  the  Hamiltonian.  This  is  accomplished  by  taking  the  Hamiltonian  in  (3.8)  and 
substituting  the  definitions  of  the  parametric  uncertainties  in  (3.3),  which  forms  an  in¬ 
equality 


sup  H{x,Vj ,u,w,0)  <  Vx(x)\(f(x)  +  H2(x)F(x,6,t)E1)+g1(x)w(t) 

-W  Gt -G  L  V  / 


weWfleQ 


+  [g2(x)  +  g2(x)F(x,6,t)E2)u(t)  +^\\z(t)\\2-^2\\w(t)\\2.  (3.17) 


Expanding  the  expression  and  applying  the  identities  (3.14)  and  (3.16),  we  have 


sup  H(x,V?,u,w,0)  <  Vx (x)f(x)+  \vx(x)H2(x)HI{x)VJ(x)f\i32EI{x)E1(x) 

w£\V  LZ  Z 

+  Vx{x)g1[x)w{t)  +  Vx(x)g2(x)  (2I  +  ^/32E% (x)E2{x)  \  u(t ) 


+  hT(x)h(x )  +uT(t)u(t)  -  ^72||w(£)||2),  (3.18) 


where  the  expressions  for  F(x,9,t )  have  been  eliminated  in  the  Hamiltonian,  and  hence 
the  supremum  is  no  longer  subject  to  0  G  0.  According  to  the  first-order  necessary 
condition  for  optimality,  we  can  obtain  expressions  for  u(t)  and  w(t)  by  differentiating  the 
Hamiltonian  with  respect  to  u(t)  and  w(t),  respectively,  then  solving  for  the  unknowns  as 
functions  of  Vx(x).  Since  most  of  the  terms  in  (3.18)  are  not  dependent  on  either  u(t)  or 
w(t),  we  can  easily  simplify  the  expressions  to 

u*(t)  =  -  (^21  +  ^P2E2  (x)E2(x)Sj  g2Vj(x) 

w*(f)  =  ^9i(x)Vx(x),  (3.19) 

which  are  the  optimal  feedback  laws.  Substituting  these  expressions  back  into  (3.8),  we 
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obtain  the  equation 


Vx(x)f(x)  +  -Vx{x) 


\gi{x)gT{x)  +  h2{x)hI  (x) 

.7 


-g2(x)l2I  +  -fE'(x)E2(x))g'(x) 


VxT(x )  +  -hT(x)h(x)  +  -j32E1(x)E[(x)  <  0. 

Zj  £ 


(3.20) 


This  is  the  Hamilton- J acobi- Isaacs  equation  for  the  optimization  problem  laid  out  in  (3.1). 
Given  the  admissible  uncertainties  in  (3.3),  this  is  a  more  general  version  of  the  Isaacs 
equation  as  compared  to  [33].  It  can  be  made  even  more  general  by  assuming  non-affine 
dynamics,  as  described  in  [15],  but  as  we  shall  see  in  Section  3.3,  we  can  leverage  the 
structures  of  this  equation  to  improve  computational  efficiency. 

By  examining  the  form  of  the  feedback  control  laws  in  (3.19),  it  is  clear  that  while  both 
players’  strategies  are  non-cooperative  and  independent  of  each  other,  they  both  depend 
on  the  value  function  (or  rather,  the  gradient  of  the  value  function).  The  sign  difference 
in  the  feedback  control  laws  arises  from  the  zero-sum  nature  of  the  game,  where  both 
players  are  optimizing  the  same  objective  function,  though  in  different  directions.  Where 
the  minimizing  player  u  follows  a  gradient  descent  of  the  value  function,  the  maximizing 
player  w  follows  a  gradient  ascent. 

It  is  also  worth  noting  that  the  feedback  control  laws  obtained  in  (3.19)  and  the  asso¬ 
ciated  Isaacs  equation  (3.21)  do  not  explicitly  depend  on  the  parametric  uncertainty  term 
F(x,  9,  t ).  As  we  assumed  that  the  squared-norm  of  F(x,  6*,  t)  is  bounded  by  a  sphere,  the 
uncertainty  was  taken  into  account  in  the  formulation  of  the  Isaacs  equation  that  pro¬ 
duced  the  value  function  as  a  solution,  which  accounts  for  all  possible  forms  of  F(x,9,t ) 
that  satisfy  the  bounded  norm  assumption. 

Theorem  3.3.  (Necessary  and  sufficient  condition  for  optimality).  The  Hamilton- Jacobi- 
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Isaacs  equation 


is  a  necessary  and  sufficient  condition  for  optimality  for  the  optimization  problem  (3.4), 
subject  to  the  system  dynamics  (3.1). 

Proof.  Since  we  derived  the  Isaacs  equation  using  first-order  necessary  conditions  on  the 
Hamiltonian,  it  is  clear  that  (3.21)  is  a  necessary  condition  for  ( u*,w *)  to  be  an  unique 
optimum.  To  show  that  the  Isaacs  equation  is  also  a  sufficient  condition  for  optimality,  we 
use  (3.18) 


sup  H(x,  Vj,  u,  w,  9)  <  Vx (x)f(x)  + 


(x)f(x)  +  \]-Vx  (x)H2{x)H%  (x)Vj(x)  +  J/32^  (x)E1(x) 


wGW 


and  note  that  the  right  hand  side  is  the  same  as 


H(x,Vx,u,w,9)  =1  h  H2(x)Hj(x)  -  g2(x)g%(x)  Vj  (x) 


\ hT(x)h(x )  +  i/3' 2El(x)E1(x )  -  ^  w(t)  -  ^ 

\vx(x)g2(x)  \il  P-^Etx(x)E2(x)\  gl(x)Vj(x). 


wit) 


2 


(3.23) 
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It  follows  that  since 


Vx(x)f(x)  +  ~ Vx(x )  gi(x)gj(x)  +  H2(x)H2 (x)  -  g2(x)g2(x)  Vx(x) 

+  ^ hT(x)h(x )  +  (x)E1(x)  <  0,  (3.24) 

The  first  half  of  the  expressions  in  the  Hamiltonian  reduces  to  zero  and  the  rest  simplifies 
to 


w(t)  -  \gl  (: x)v. J (x)  -  \vx(x)g2(x) 

z 


31  +  (x)E2(x)  g2{x)Vj(x)  <  0. 


This  indicates  that  (3.21)  is  also  a  sufficient  condition  for  optimality. 


(3.25) 

□ 


As  is  typical  in  optimal  control  formulations,  the  value  function  here  is  a  natural  Lya¬ 
punov  function  candidate.  While  it  is  trivial  to  verify  the  asymptotic  stability  of  the  closed 
loop  system  with  no  disturbance,  let  us  consider  the  interpretation  of  the  value  function 
assuming  a  disturbance  exists.  By  rearranging  (3.6),  we  can  obtain 


V(x(tf),tf)  -  V(x(t0),t0)  < 


ftf  1  r 

Jto  2L 


72|Kf)||2-||*(f)||5 


dt. 


(3.26) 


Taking  the  limit  as  as  tf  — >  oo,  the  value  function  is  zero  at  terminal  time  by  definition. 
Splitting  the  integral  and  rearranging  again,  we  obtain 


1 

2 


^(t)||2  dt  < 


\  [  72||m,(^)I|2  dt  +  V(x(to),to). 

1  J  to 


(3.27) 


This  is  exactly  the  definition  of  the  C2-gEm  defined  in  (1.5),  indicating  that  the  existence 
of  the  value  function  for  this  system  implies  that  the  /VgCn  is  bounded  by  7. 


Jiuguang  Wang 


61 


Numerical  Nonlinear  Robust  Control  with  Applications  to  Humanoid  Robots 


3.3  Value  function  approximation 

As  discussed  in  Section  1.2.2,  analytical  solutions  to  the  Isaacs  equations  cannot  be  ex¬ 
pected  for  general  nonlinear  systems.  Instead,  we  pursue  the  idea  of  value  function  approxi¬ 
mation,  reviewed  in  Section  1.2.3.  Given  the  Isaacs  equation  in  (3.21),  the  Galerkin  method 
provides  a  convenient  way  of  producing  a  discrete  representation,  where  approximations 
to  the  value  function  can  be  carried  out. 

The  Galerkin  method  is  suitable  for  operator  equations  of  the  form 

T{V{x))  =  l(x),  a<x<b,  (3.28) 

where  T  is  a  linear  operator,  V  is  the  unknown  variable  to  be  determined,  and  l(x)  is  a 
function  in  terms  of  the  independent  variable  x,  bounded  above  and  below  by  a  and  b, 
respectively.  The  boundary  condition  is  defined  as 

S(x)  =  0.  (3.29) 

In  the  case  of  the  Isaacs  equation,  which  is  a  first-order  nonlinear  partial  differential  equa¬ 
tion,  T{V)  is  an  operator  involving  the  partial  derivatives  of  value  function  V(x)  and  l(x) 
contain  miscellaneous  terms  independent  of  the  value  function. 

Following  the  approach  presented  in  Chapter  2,  we  approximate  the  value  function  V 
as  a  weighted  sum  of  a  set  of  basis  functions  </>,. 

Definition  3.4.  (Approximate  value  function).  The  approximate  value  function  V(x)  is 

n 

V{x)  =  ^Wifaix),  (3.30) 

2=1 

where  <fi(x)  is  a  vector  of  basis  function  with  corresponding  weights  wt . 
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We  seek  an  approximation  in  the  least-squares  sense 


\\V(x)  -  V*(x)\\2tW=  min  \\V(x)  -  V(x)\\2,w.  (3.31) 

V(x) 


That  is,  the  optimal  approximation  V*(x)  minimizes  the  weighted  /W11**™1  of  the  approx¬ 
imation  errors. 

Let  the  approximation  error,  also  known  as  the  residual,  take  the  form 


R(x)  =  t(  ^Tw^x)  J  -  l{x). 


(3.32) 


2=1 


The  Galerkin  method  requires  that  the  error  be  orthogonal  to  a  weight  function  in  the 
form 

(3.33) 


Wi  =  =  (j>i(x). 


duij 


which  is  simply  the  basis  function,  i.e., 


Rn(x)(f>i(x )  dx  —  0,  for  i  =  1, . . . ,  n. 


(3.34) 


This  produces 


<f>i(x)l(x)  dx  =  0, 


for  i  —  1, 


n. 


(3.35) 


In  the  case  where  the  operator  T(V)  does  not  include  time  derivatives,  the  Galerkin  method 
produces  a  set  of  n  algebraic  equations.  This  is  the  case  in  the  Isaacs  equation.  Otherwise, 
it  produces  a  set  of  n  ordinary  differential  equations,  with  n  unknown  coefficients  for  the 
basis  functions.  In  general,  the  integrals  can  be  evaluated  numerically  with  quadratures, 
and  the  resulting  problem  is  then  a  nonlinear  least-squares  problem  for  the  coefficients  wt . 
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We  are  now  in  a  position  to  consider  the  computational  issues  in  choosing  an  approxi¬ 
mate  representation  for  the  value  function.  There  are  many  ways  to  select  a  basis  function. 
For  simple  problems  with  linear  dynamics,  directly  using  the  dimensions  of  the  states  as 
basis  functions  would  suffice,  i.e., 


4>o{x )  =  1 

<t>i{x)  =  Xi.  (3.36) 

Clearly,  this  simple  scheme  cannot  represent  complex  value  functions  for  nonlinear  systems. 
An  immediate  extension  is  using  a  k- th  order  polynomial  as  a  basis  in  the  form 

n 

Mx)  =  hbA  mj  =  [l...k],  (3.37) 

3= 1 

but  this  choice  is  subject  to  Runge’s  phenomenon,  as  discussed  in  Section  2.1.  For  a  broad 
overview  of  various  function  approximation  schemes  in  the  context  of  dynamic  program¬ 
ming  and  reinforcement  learning,  we  refer  the  reader  to  Kober  [36]. 

In  this  work,  we  utilize  Fourier  approximations  in  connection  with  the  spectral  ap¬ 
proximation  framework  presented  in  Chapter  2.  As  discussed  in  Section  2.5,  Fourier  basis 
functions  enjoy  the  property  of  spectral  accuracy,  which  results  in  the  exponential  conver¬ 
gence  of  the  approximation  when  the  order  of  the  Fourier  series  is  increased.  As  discussed 
in  Konidaris  [131],  there  are  very  few  existing  applications  of  Fourier  approximation  in 
dynamic  programming  and  reinforcement  learning. 

We  first  consider  Fourier  approximation  for  single  dimensional  functions.  Given  a  single¬ 
dimensional  function  V(x),  its  n-tli  degree  Fourier  expansion  with  period  T  is  a  linear 
combination  of  sinusoidal  functions. 

Definition  3.5.  (Single-dimensional  Fourier  expansion).  For  any  single-dimensional 
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function  V(x),  the  Fourier  expansion  V{x)  takes  the  form 


W)  =  f+£ 

k= 1 


ak  cos  (^yx)  +  ^  sin 


(3.38) 


where  the  coefficients  ak  and  bk  are  given  by 


ak  =  t  J  cos  iklT'x)  dx 

2  rT  .  2 \ 

bk  =  —  J  V{x)  sin  yk—xj  dx. 


(3.39) 


For  an  unknown  V  (x) ,  we  can  treat  the  coefficients  ak  and  bk  as  values  to  be  learned  in  a 
linear  function  approximation  using  the  basis  6l  in  the  form 


{1  i  =  0 

cos(^7rx)  i>0,ifiisodd  •  (3.40) 

sin(|7ra;)  i  >  0,  if  *  is  even 


For  a  multivariate  function  V (x)  with  period  T  in  m-dimensions,  we  have 

Definition  3.6.  (Multi-dimensional  Fourier  expansion).  For  any  multi-dimensional  func¬ 
tion  V(x),  the  Fourier  expansion  V(x)  takes  the  form 


v(x)  =  £ 


(3-41) 


with  a  vector  of  coefficients  c  =  [c\ , . . . ,  cm  . 


For  an  n-th  order  Fourier  approximation  in  m-dimensions,  this  results  in  2(n+l)m  basis 
functions,  which  is  an  exponential  explosion  of  parameters  in  the  number  of  dimensions. 
There  are  several  ways  that  we  can  reduce  the  number  of  basis  functions.  First,  we  note 
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that  V  ( x )  can  be  even  if 


V(x)  =  V(-x) 


and  odd  if 

V(x)  =  -V(-x). 


(3.42) 


(3.43) 


In  these  cases,  the  symmetry  in  the  function  reduces  the  coefficients  to  a*  =  0  and  bi  =  0, 
respectively.  This  means  that  the  corresponding  sin  and  cos  terms  can  be  dropped,  which 
reduces  the  number  of  basis  functions  to  [n  +  1)  for  a  single  dimension  in  the  form 


<pi(x)  =  COs(7TC  •  x). 


(3.44) 


To  handle  the  exponential  nature  of  the  growth,  we  use  a  common  technique  in  function 
approximation  called  variable  coupling,  which  removes  the  simultaneous  contribution  of 
multiple  state  variables.  For  example,  in  a  two-dimensional  problem  with  a  polynomial 
basis  function  that  contains  the  terms  l,xi,x2,  and  XiX2,  we  assume  that  aq  and  x2  con¬ 
tributes  to  the  approximation  independently  and  uncouple  the  variables  by  dropping  the 
X\X2  term.  In  the  context  of  Fourier  basis  functions,  we  obtain  an  uncoupled  basis  by 
requiring  that  only  one  element  is  c  is  non-zero.  For  a  k- th  order  approximation  with  n 
states,  this  results  in  n(k  +  1)  number  of  basis  functions.  Table  3.1  lists  the  coupled  and 
uncoupled  Fourier  basis  functions  for  a  two-dimensional  system  with  orders  1,2,  and  3. 


3.4  Example  -  two-dimensional  uncertain  system 

3.4.1  Dynamics 

In  this  section,  we  solve  an  example  'HO0  robust  control  problem  under  the  differential  game 
formulation,  using  the  derivations  of  the  necessary  and  sufficient  conditions  for  optimality, 
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as  well  as  parametric  approximations  to  the  value  function. 

Consider  the  robust  control  problem 

1  ftf 

J(x,  u,  w,  9)  =  min  max  -  / 
ugu  wgw  2  K 
oee  to 

subject  to  the  dynamics 
Xi(t)  =  x2(t) 

^  +  9(t)^j  x\ (t)  -  ^x\ (t)  -  .(•,(/)  +  x2(t)w(t)  +  (1  +  9(t))  u 

z(t )  =  [x2(t),u(t)]T 

x(t0)  =  x0 

7  =  1,  /3  =  1.  (3.46) 

We  can  convert  this  system  to  the  matrix  control-affine  form 


\z(t)\\2-i2MtW 


dt, 


(3.45) 


x(t)  =  [f(x)  +  A f(x,  9 ,  t)]  +  gi(x)w(t )  +  [g2(x)  +  A g2(x,  9 ,  t)]  u(t ) 
z(t)  =  [h(x),  u(t)]T 

x(t0)  =  x0,  (3.47) 

with  A f(x,9,t)  and  A g2(x,9,t)  in  parameterized  form 

A  f(x,9,t)  =  H2(x)F(x,9,t)Ei(x) 

Ag2(x,  9 ,  t)  =  g2(x)F(x,  9,  t)E2(x),  (3.48) 
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and  |  \F(x,  6,  t)\ |2<  1.  The  system  matrices  are  given  by 


f(x) 


x2(t) 

\x\(t)  -  lxl(t)-x1(t) 


9i(x)  =  ,  92{x) 

x2(t) 


0 


1 


H2(x)  =  0  —  x\(t) 

Ei(x)  =  x2(t),  E2(x)  =  1,  h(x)  =  x2(t),  F(x,9,t)  =  9(t) 


(3.49) 


The  uncertain  term  9{t)  can  take  on  the  form  of  any  function  (time- varying  or  not),  as¬ 
suming  that  the  squared-norm  of  the  function  is  bounded  according  to  the  assumptions 
given.  An  example  is  9(t)  =  O.Olsin(t),  which  can  be  interpreted  as  a  small  perturbation 
added  to  the  nominal  system  parameter. 

3.4.2  Necessary  and  sufficient  conditions 

Following  the  definitions  in  Section  3.2,  the  saddle-point  equilibrium  of  the  problem  is 
given  by  the  value  function 


(3.50) 


ugu  2  Jj. 

ees 


where 


are  the  partial  derivatives  of  the  value  function  with  respect  to  the  state  variables.  The 
necessary  and  sufficient  condition  for  optimality  is  the  Hamilton-Jacobi-Isaacs  equation, 
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given  by 


Vx(x)f(x)  +  -Vx(x) 


■\gi(x)gl(x)  +  h2(x)hJ  (; x ) 
.7 


)  2I+-EZ(x)E2(x)  \gl(x) 


v^)*-2^)  +  -2eMx)<  0. 


Substituting  the  system  matrices,  we  have 


(3.52) 


V22(a)  (^2  +  ^i-2) 


2  +  Ki  0*0  x2  +  xl  —  VX2  (x)  (  +  x\  )  <  0. 


(3.53) 


By  assuming  a  positive-definite  value  function  candidate  in  the  form 


V(x)  =  i(,T?  +  j|) 


(3.54) 


with  the  costate 


Vx(x)  =  [xi  x2], 


(3.55) 


we  can  show  that 


x\  (x\  +  x\-2) 


Xn  Xn 


X2  (  —  +  —  +  Xl  )  +  X\  x2  +  x2  —  0, 


(3.56) 


which  satisfies  the  HJI.  Hence,  the  optimal  controls  are  given  by 

1 

^27  + 

1 


«*(«)  =  -  (  2I+--El(x)Ei(x)  )  £vj(x) 


»■(*)  =  yji  (UK  M, 


(3.57) 
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which  in  this  case  is 


u*(t)  =  —2x2 

w*(t )  =  x\.  (3.58) 


3.4.3  Value  function  approximation 


We  follow  the  approach  presented  in  Section  3.3  to  solve  the  problem  (3.45),  subject  to 
the  dynamics  (3.46).  For  simple  problems  such  as  this,  we  can  in  fact  solve  the  value 
function  approximation  problem  almost  analytically.  In  this  example  problem,  we  use 
both  a  traditional  polynomial  as  well  as  the  Fourier  basis  function. 


Given  the  approximate  value  function  V  (x)  as  a  weighted  sum  of  a  set  of  basis  functions 

4>i 

n 

V(x)  =  (3.59) 

2=1 

we  first  use  the  approximation 


~  z  \  o  o  o  o 

V  ( X )  =  Wq  +  W\Xi  +  W2X2  +  WzX1X2  +  W^xl  +  W5X2 


(3.60) 


with  the  derivative 


Vx(x)  =  2w^x\x\  +  w\  +  2u;4Xi,  2^33:2^1  +  W2  +  2w5X2 


(3.61) 


Let  the  residual  function  take  the  form 


R(x)  =  tI  \  -l(x). 


(3.62) 


2=1 


In  this  case,  this  is  simply  the  Isaacs  equation  evaluated  with  the  approximated  value 
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function  (3.60),  which  results  in 

R(x)  =  x2  (2  w3  X\  X2  T  W\  T  2  W4  .rq)  +  x\ 


+  (;?4  +  ^2  -  2)  (2  u>3  £2  +  w2  +  2  u>5  x2)  (w3  x2  x\  +  +  w5  £2 

Immediately,  we  notice  that  w0  disappears.  By  setting 

R(0)  =  0, 


we  also  find  that  —w%  =  0,  hence  w2  =  0.  To  find  the  remaining  weights  wq,  w3, 
we  generate  the  system  of  equations 

b 

Rn(x)4>i(x)  dx  =  0,  for  i  =  l,...,n. 

This  produces  four  equations 


II  R(x)  (wiXi^J  dx  1  dx2  =  0 


Normally,  these  integrals  would  be  evaluated  numerically  with  quadratures,  but 


(3.63) 


(3.64) 
w4,  w5, 


(3.65) 


(3.66) 
in  this 
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case,  the  equations  are  simple  enough  to  solve  analytically,  which  results  in 


34  W32 

34  W>3  105 

5  w3 

34  W52 

53  W5  I  W\  I  W4  I  1 

315 

105 

42 

105 

105  t  4  t  3  t  6 

92  W32 

184  W3  w 5 

16  W3 

92  W52 

317  W5 

,  Wl  ,  W4  1  1 

12  “r  8  T  15 

2205 

1575 

315 

945 

1512 

68  W32 

136  w 3 

463  W3 

68  W52 

51  W5 

l  W%  i  W4  1  1 

'  6  '  4  '  9 

735 

525 

4200 

315 

140 

92  W3  2 

184  W3  W5 

65  W3 

92  W52 

127  W3 

-L.  Wl  1  W4  t  1 

4  4  5 

1575 

945 

1512 

315 

252 

which  yields  the  solution  w\  =  0,  uq  =  0,  uq  =  0.5,  and  w§  =  0.5,  i.e., 

V(x)  =  ^{xl  +  xl). 


(3.67) 


(3.68) 


Now  let  us  consider  a  more  general  case  of  value  function  approximation  in  the  form 

n 

V{x)  =  ^2'wicf)i(x),  (3.69) 

2=1 

with  <f>i(x)  given  by  Fourier  basis  functions.  In  this  case,  the  system  of  integral  equations 
must  be  solved  numerically,  with  a  Gaussian  quadrature  embedded  within  a  trust  region 
dogleg  iteration  to  solve  the  system  of  equations. 

To  approximate  the  value  function,  we  used  the  Fourier-basis  functions  described  in 
Section  3.3  in  both  coupled  and  uncoupled  settings.  For  a  15-degree  Fourier  basis  ap¬ 
proximation  of  a  two-dimensional  system,  there  are  256  coupled  basis  functions  and  31 
uncoupled  basis  functions.  Because  the  exact  form  of  the  value  function  is  in  fact  uncou¬ 
pled,  we  expect  that  the  uncoupled  Fourier  basis  will  converge  faster. 

We  searched  the  coefficients  for  the  basis  functions  for  500  iterations  and  generated 
Fig.  3.1.  Indeed,  the  coupled  Fourier  basis  Fig.  3.1(b)  was  slower  to  produce  the  same 
approximation  compared  to  the  uncoupled  basis  Fig.  3.1(c).  Using  the  same  number  of 
iterations,  the  approximation  quality  for  the  coupled  basis  is  four  orders  of  magnitude 
worse  than  the  uncoupled.  When  compared  with  the  exact  value  function  Fig.  3.1(a), 
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(a)  Exact  (b)  Coupled  Fourier-basis 


(c)  Uncoupled  Fourier-basis 


0.1 

0.05 

0 

-0.05 

-0.1: 


(d)  Coupled  Fourier  basis  residuals  (e)  Uncoupled  Fourier  basis  residu¬ 
als.  Note  the  changed  vertical  scale 
compared  to  Fig.  3.1(d). 

Figure  3.1:  Exact  value  function,  Fourier-basis  approximation,  and  their  residuals. 

the  residuals  of  the  coupled  and  uncoupled  approximations  are  given  in  Fig.  3.1(d)  and 
Fig.  3.1(e). 

Using  the  initial  condition  x0  =  [1  1]T,  we  can  simulate  the  open-loop  (without  con¬ 

trols)  response  of  the  system  under  external  disturbance  and  verify  that  the  system  is 
unstable,  as  shown  in  Fig.  3.2(a).  Using  the  control  laws  (3.19)  in  conjunction  with  the 
Fourier  approximated  value  function  in  Fig.  3.1(c),  the  closed-loop  response  of  the  system 
is  shown  in  Fig.  3.2(b),  assuming  no  external  disturbance  and  no  parametric  uncertainties. 
This  is  consistent  with  the  theoretical  analysis  under  Lyapunov  stability,  where  we  showed 
in  (??)  that  using  a  value  function  as  a  Lyapunov  candidate,  the  resulting  system  is  stable 
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(a)  Open-Loop  response. 


(b)  Closed-Loop  response  with  no  disturbance. 


Figure  3.2:  Open  and  closed-loop  response  of  a  two-dimensional  system  under  disturbance. 
Without  controls,  the  system  does  not  converge.  With  controls  but  without  disturbances 
and  uncertainties,  the  system  is  stable. 


with  and  without  disturbances. 

We  can  further  verify  the  performance  of  the  control  design  under  bounded  external 
disturbances  and  parametric  uncertainties.  With  6(t)  =  O.Olsin(t),  Fig.  3.3  demonstrates 
the  closed-loop  response  of  the  system.  Fig.  3.3(a)  shows  the  response  of  the  system 
under  optimal  disturbance,  with  w(t)  computed  from  (3.19),  and  Fig.  3.3(b)  shows  the 
response  under  a  non-optimal  disturbance  in  the  form  of  a  sinusoidal  signal.  In  both 
cases,  the  feedback  control  law  was  able  to  stabilize  the  systems  given  the  disturbances 
and  uncertainties. 
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(a)  Closed-loop  response  with  optimal  disturbance,  (b)  Closed-loop  response  with  sinusoidal  distur¬ 
bance. 

Figure  3.3:  Closed-loop  response  of  a  two-dimensional  system.  The  feedback  control  is 
able  to  stabilize  the  system  under  both  optimal  and  sinusoidal  disturbances. 

3.5  Summary 

In  this  chapter,  we’ve  derived  necessary  and  sufficient  conditions  for  optimality  for  a  'HO0 
robust  control  problem  in  terms  of  a  Hamilton-Jacobi-Isaacs  equation.  A  value  function  was 
given  as  the  solution  of  the  Isaacs  equation,  from  which  optimal  feedback  control  laws  were 
derived  for  both  players.  For  practical  solutions  of  the  Isaacs  equation,  we  presented  a  value 
function  approximation  approach,  which  relied  on  Fourier  basis  functions  in  connection 
with  Galerkin’s  method,  under  the  general  spectral  approximation  framework  presented  in 
Chapter  2.5. 
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Ultimately,  for  systems  of  complex  nonlinear  behaviors  and  increased  dimensionality, 
the  scalability  of  value  function  approximation  is  limited.  One  effective  way  of  avoiding  the 
exponential  explosion  of  computational  demand  associated  with  searching  for  global  value 
functions  is  to  only  search  for  locally  optimal  solutions.  In  the  next  chapter,  we  explore 
the  class  of  trajectory  optimization  techniques  to  solve  larger-scale  problems  which  out  of 
the  reach  of  function  approximation-based  methods. 
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4.1  Motivation 

In  the  previous  chapter,  we  have  examined  'Hoc  robust  control  from  the  perspective  of 
dynamic  programming  and  derived  solutions  based  on  necessary  and  sufficient  conditions 
for  optimality  given  parametric  uncertainty  and  external  disturbances.  We  derived  global 
solutions  to  this  problem  using  the  exact  Bellman  value  function  as  well  as  its  parametric 
approximations.  Though  the  approach  characterized  global  solutions,  its  practical  imple¬ 
mentation  is  limited  by  the  complexity  of  the  underlying  dynamic  system. 

It  is  well-known  in  optimal  control  that  bypassing  the  curse  of  dimensionality  is  a 
difficult  endeavor.  In  the  value  function  approximation  framework  in  Section  3.3,  we  for¬ 
mulated  the  notion  of  decoupled  basis  functions,  which  mitigates  the  effects  of  the  curse  of 
dimensionality.  Ultimately,  for  systems  of  complex  nonlinear  behaviors  and  increased  di¬ 
mensionality,  the  scalability  of  value  function  approximation  is  limited.  One  effective  way 
of  avoiding  the  exponential  explosion  of  computational  demand  associated  with  searching 
for  global  value  functions  is  to  only  search  for  locally  optimal  solutions. 

Trajectory  optimization  refers  to  methods  where  we  seek  to  obtain  only  open-loop  tra¬ 
jectories  that  satisfy  certain  optimality  conditions.  Approaches  to  trajectory  optimization 
can  be  divided  into  direct  and  indirect  methods,  based  on  the  order  in  which  necessary 
conditions  for  optimality  are  applied.  These  methods  are  reviewed  in  Section  1.2.5.  By 
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bypassing  the  need  to  solve  the  full  necessary  and  sufficient  conditions  for  optimality,  tra¬ 
jectory  optimization  is  a  viable  approach  for  avoiding  the  curse  of  dimensionality.  However, 
while  direct  trajectory  optimization  has  been  successful  in  addressing  optimal  control  prob¬ 
lems,  applying  it  to  robust  control  is  not  straightforward  and  a  naive  implementation  tends 
to  result  in  an  expensive  optimization  problem  (discussed  in  Section  1.2.5). 

In  this  chapter,  we  extend  direct  trajectory  optimization  to  'HO0  control  and  differential 
games  by  transforming  it  into  a  mixed  complementarity  problem  (MCP).  This  approach 
is  consistent  with  the  spectral  approximation  methods  presented  in  Chapter  2,  and  the 
resulting  discrete  optimization  problem  can  be  solved  efficiently  using  commercially  avail¬ 
able  solvers.  Consequently,  the  resulting  approach  is  suitable  to  be  applied  in  a  receding- 
horizon  control  setting,  which  utilizes  open-loop  trajectories  in  a  closed-loop  setting.  We 
show  that  the  approach  is  robust  against  bounded  external  disturbances  and  parametric 
uncertainties,  and  demonstrate  the  practical  implementation  of  the  approach  using  a  sim¬ 
ple  two-dimensional  system,  then  compare  the  results  against  the  global  solutions  obtained 
in  the  previous  chapter. 

4.2  First-order  necessary  conditions 

We  begin  the  formulation  of  our  trajectory  optimization  method  by  first  discussing  the 
indirect  approach.  In  Section  1.3,  we  formulated  a  robust  'H00-control  problem  with  the 
minimax  objective 


=  mm  max 

u£lA  wGW 

oee 


(4.1) 
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which  entails  a  two-player,  noncooperative,  zero-sum  differential  game  over  a  horizon  tf  > 
to,  subject  to  the  nonlinear  dynamics 


x{t)  =  f(x)  +  Af(x,  9,  t)  •!  gi(x)w(t)  f  g2{x)  +  Ag2(x,9,t)  u(t) 
z(t )  =  [h(x),  u(t)]T 
x(t0)  =  x0, 


(4.2) 


with  x(t)  G  X  as  the  state  vector,  z(t)  G  Z  is  the  output  vector,  u(t)  G  U  and  w(t)  G  W 
are  the  minimizing  and  maximizing  players,  respectively,  :r0  is  a  vector  of  initial  states,  and 
the  matrices  f(x),  g(x),  and  h(x)  are  the  system  matrices  that  are  smooth  vector  fields  not 
explicitly  parameterized  by  time.  The  parameter  9  G  0  is  the  uncertain  system  parameter 
which  belongs  to  the  set 

0  =  {6>|0  <  9  <  9U]  (4.3) 

that  is  bounded  above  by  9U.  Parametrized  by  9,  the  functions  A f(x,  9,  t )  and  Ag2(x,  9,  t ) 
are  uncertain  functions  belonging  to  the  set  of  admissible  uncertainties. 

In  the  settings  of  the  two-player  differential  game,  open-loop  solutions  entail  that  both 
players  formulate  their  strategy  at  the  moment  the  system  starts  to  evolve,  based  on 
information  available  at  the  time:  the  system  dynamics,  the  objective  function,  and  initial 
conditions.  This  strategy  cannot  be  changed  once  the  system  evolves  and  the  saddle-point 
solution  is  the  combination  of  strategies  of  both  players  which  are  secured  against  any 
attempt  by  one  player  to  unilaterally  change  his  strategy. 

Let  us  first  consider  a  slightly  more  general  version  of  this  problem,  with  a  more  general 
form  of  the  objective  function  and  system  dynamics  as  compared  to  (4.1)  and  (4.2). 

Definition  4.1.  (Objective  function  -  primal).  The  optimization  objective  takes  the  form 

rtf 

J(x,  u,  w,  9,  t0,  tf)  =  Q>(x(to),  x(tf),  t0,  tf)  +  /  C(x,u,w)dt,  (4.4) 
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where  &(x(t0),x(tf),t0,tf)  is  commonly  referred  to  as  the  terminal  cost  and  C(x,u,w )  is 
the  one- step  cost  integrated  over  the  horizon  tf  >  t0.  We  assume  that  x(t)  =  0  is  an  unique 
equilibrium  point  of  the  system  with  u[t)  =  0  and  wit)  =  0. 

Definition  4.2.  (Uncertain  system  dynamics).  The  dynamics  of  the  system  is  given  by  a 
set  of  first-order  differential  equations 

x(t)  =  f(x,u,w,6),  (4.5) 

where  x  is  the  state  vector,  u  is  the  vector  of  controlled  inputs  to  the  system,  w  is  the 
vector  of  external  disturbances,  and  theta  is  the  time- varying  unknown  parameters  within 
the  system.  For  this  set  of  dynamics,  the  initial  condition  and  final  condition  are  expressed 
as 

e(x(t0),x(tf),t0,  tf)  =  0,  (4.6) 

which  is  a  boundary  constraint  at  both  the  initial  time  to  and  final  time  tf.  Additional 
path  constraints  take  the  form  of  an  inequality 

s(x,  u,  w )  <  0,  (4.7) 

which  represent  constraints  on  the  states  and  control  inputs.  We  assume  that  the  con¬ 
straints  on  the  states  and  control  inputs  are  fixed  and  contains  no  uncertainties.  We 
assume  that  /,  <L,  C ,  e,  and  s  are  nonlinear  and  smooth  functions  with  respect  to  x,  u, 
and  w . 

The  optimization  (4.4),  subject  to  the  dynamics  (4.5),  boundary  conditions  (4.6),  and 
path  constraints  (4.7)  is  called  the  primal  problem.  We  now  convert  this  constrained 
optimization  problem  into  an  unconstrained  problem  in  the  dual  form.  Let  us  define  a 
Lagrange  multiplier  v  to  combine  the  terminal  cost  (4.4)  and  terminal  constraint  (4.6)  in 
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the  form 


E(x(t0),x(tf),t0,  tf,  v )  =  $(x(t0),x(tf),t0,  tf )  +  vTe(x(t0),x(tf),  t0,  tf)  (4.8) 

and  define  the  Lagrange  multipliers  A  and  /i  to  incorporate  the  dynamics  (4.5)  and  path 
constraints  (4.7)  to  form  the  objective  function 


J(x,u,w,6,t0,tf )  =  E(x(t0),x(tf),t0,tf,v) 

rff 


+ 


'to 


C(x,u,w)  +  X1lf(x,u,w,9)—x(t))+^1s(x,u,w)  dt.  (4.9) 


Similar  to  (3.8),  we  use  the  Hamiltonian  to  express  the  concatenation  of  the  various  opti¬ 
mization  constraints. 


Definition  4.3.  (Hamiltonian).  The  Hamiltonian  H(x,u,w,9,  A, /r)  for  the  system  dy¬ 
namics  in  (4.5)  and  objective  function  (4.9)  is  a  function  of  the  states  x,  control  inputs  u 
and  w,  as  well  as  the  Lagrange  multipliers  A  and  //  in  the  form 


H(x,u,w,9,  A,  n)  =  C(x,  u,  w )  +  XT  ^  f(x ,  u,  w,  +  nT  (s(x,  u ,  w)^j , 


(4.10) 


where  C(x,u,w )  is  the  one  step  cost,  f(x,u,w )  is  the  system  dynamics  ,  and  s(x,u,w)  is 
the  path  constraint  set. 


In  this  context,  Lagrange  multipliers  v,  A,  and  fi  are  commonly  referred  to  as  the  co state 
or  adjoint  variables.  Substituting  the  Hamiltonian  into  the  objective  function,  we  have 


J(x,u,w,9,to,tf )  =  E(x(t0),x(tf),to,tf,v)  + 


9,  A,  ji)  —  X Tx(t)]  dt. 


(4.11) 
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Examining  the  last  term  in  the  integral,  we  can  express  A  in  terms  of  its  derivative  A  as 


rtf 


A T(t)x(t)  dt 


'to 


f  A T(t)x(t)dt 
Jto 

rff  . 

XT (tf)x(tf)  —  XT (to)x(to)  —  /  A T (t)x(t)  dt. 


(4.12) 


Effectively,  this  gives  us  the  dynamics  of  A  with  respect  to  the  state  x,  at  the  initial  and 
final  times  t0  and  tf,  respectively.  Rewriting  the  objective  function,  we  obtain 


Definition  4.4.  (Objective  function  -  dual)  The  optimization  objective  for  the  dual  prob¬ 
lem  takes  the  form 


J(x,u,w,9,t0,tf )  =E(x(t0),x(tf),t0,tf,v )  -  XT(tf)x(tf) 


r*f 


'to 


+  X1  (t0)x(to)  +  /  \H(x,  u,  w,  9,  A,  fi)  +  (t)x(t)\  dt ,  (4.13) 


where  E(x(t0),x(tf),t0,tf,v)  is  the  terminal  cost,  A  is  the  costate,  and  H(x,u,w,9,  A,  /i) 
is  the  Hamiltonian. 


Given  this  augmented  objective  function  with  constraints  adjoined  in  the  manner  of  La¬ 
grange,  we  are  now  in  a  position  to  derive  the  first-order  necessary  conditions  for  optimality 
using  the  calculus  of  variations.  The  first  variation  of  J  is 


SJ(x,  u,  w,  6,  to,tf )  = 


rtf 


dE  dE  T  rtf+stf 

—— — -dx(tf)  +  — — Stf  —  XT  (tf)Sx(tf)  +  /  C(x,u,w)dt 
ox{tf)  Otf  Jtf 

rtf 


+  /  Hx(x,u,w,  A,  ix)5x  dt  +  /  Hu(x,u,w,  X,  /x)Sudt 

Jto  Jto 

r^f  rff  r*f  . 

+  /  Hw{x,  u,  w.  A,  fi)Sw  dt  -f?  /  Hq(x,  u,  w,  A,  /i)59  dt  +  /  XT5xdt  =  0 ,  (4.14) 

Jto  Jto  Jto 
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where  the  various  derivatives  are  defined  as 
dE 


dE  dE 

?  •  •  *  ? 


dx{tf)  \_dxi(tf)  ’  "  ’  ’  dxNx(tf)\ 

dx(tf )  =  Sx(tf )  +  x(tf)Stf  =  8x(tf )  +  f(x,  u,  w,  9) 


St  4 


t=tf 


Hx(x,  u,  w,  9,  A,  /i)  = 
Hu(x,  u,  w,  9,  A,  /i)  = 
Hw(x,  u,  w,  9,  A,  /./)  = 
H$(x,  u,  w,  9,  A,  /i)  = 


'dH(x, 

u,  w,  9,  A,  [/,) 

dH(x,  u,  w,  9,  A,  /i) 

dx  i 

dx/v. 

' dH(x , 

u,  w,  9,  A,  /i) 

dH(x,  u,  w,  9,  A,  /x) 

<9wi 

duNu 

~dH(x, 

u,  w,  9,  A,  /i) 

dH(x,  u,  w,  9,  A,  /x) 

<9ici 

' dH(x , 

u,  w,  9,  A,  /x) 

dH(x,  u,  w,  9,  A,  /i) 

d9j\T0 

iT 


-|T 


lT 


Since 


ft/+% 


C*(x,  u,  w)  dt  =  C\t=tfStf, 


we  can  simplifying  and  collect  the  terms, 

I"  dE  1  I"  dE  dE 

6‘ 7  =  -  A  (f/)J +  [a*; +  a^n7)/(ir’ u’w’e)  +  c(a:’ 

rlf  .  ds 

+  /  (Hx(x,  u,  w,  9,  A,  /i)  +  XT)Sx  dt  +  /  Hu(x,u,w,9,  A,  /i)8udt 

J  to  J to 

rtf  rtf 

+  /  Hw(x,u,w,9,  X, /j,)Sw  dt  +  /  Hq(x,u,vj,9,  X,  ix)59  dt  =  0. 

Jto  J  U> 


(4.15) 


(4.16) 


t=t/J 


(4.17) 


The  extremum  of  the  function  J  is  obtained  when  the  first  variation  of  SJ  vanishes,  i.e., 


SJ(x,  u,  w,  9 ,  to,  tf )  =  0  (4-18) 

and  hence  the  necessary  conditions  for  optimality  can  be  established  by  setting  the  coeffi¬ 
cients  of  the  independent  variables  Stf,  Sx,  Su,  and  Sw  to  zero.  This  leads  to 
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Theorem  4.5.  (First-order  necessary  conditions  for  optimality).  For  the  optimization 
problem  (4.4),  subject  to  the  system  dynamics  (4.5),  boundary  conditions  (4.6),  and  path 
constraints  (4.7),  the  first-order  necessary  conditions  for  optimality  is  given  by  the  station¬ 
ary  conditions  for  the  Hamiltonian 


the  costate  equations 


8H(x,  u,  w,  9,  A,  ft.) 
du 

8H(x,  u,  w,  9,  A,  ft) 
dw 

8H(x,  u,  w,  9,  A,  fi) 
89 


0 

0 

0, 


(4.19) 


■T  8H(x,u,w,9,  A,  fi) 


X1  = 


A  T(tf)  = 


dx 


8E 

dx(tf)  ’ 


(4.20) 


and 


8E  8E  £. 

w; 1  a  /,  \  /  \X,  u,  w,  9)  +C{x,u,w ) 

dtf  dx(tf)  t=tf 


=  0. 


t=tf 


(4.21) 


We  omit  the  proof  here  as  it  is  trivial  to  verify  that  the  choice  of  the  costate  enforces 
8J(x,u,w,6,to,tf )  =  0. 


Intuitively,  these  results  are  consistent  with  the  costate  equations  in  Pontryagin’s  min¬ 
imum  principle  for  traditional  optimal  control  problems.  The  optimal  controls  u*,  w*,  and 
9*  are  found  by  minimizing  and  maximizing  the  Hamiltonian,  that  is, 


u*(t)  =  argmin  H(x,  u,  w,  9,  A,  /i) 

U 

w*(t )  =  arg  max  H(x,  u,  w,  9,  A,  /i) 

W 

9*(t )  =  argmaxi7(a:,  u,  w,  9,  A,  fi). 
0 


(4.22) 
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These  control  laws  are  similar  to  (3.11)  with  an  important  distinction  that  we  had  previ¬ 
ously  used  A  =  Vx(x),  where  Vx{x)  is  the  gradient  of  the  value  function.  Here,  as  value 
of  the  costate  is  obtained  by  solving  the  costate  equation  (4.20),  the  resulting  two-point 
boundary  value  problem  yields  open-loop  solutions  to  the  optimization  problem  (4.4). 

Beyond  systems  with  very  simple  dynamics,  the  two-point  boundary  value  problem  in 
(4.20)  must  be  solved  numerically,  via  the  so-called  shooting  methods  with  an  initial  guess 
for  the  ODE.  This  is  a  major  drawback  in  applying  the  indirect  method  -  the  quality  of 
the  solution  is  strongly  dependent  on  the  quality  of  the  initial  guess. 


4.3  Direct  trajectory  optimization 

To  improve  computational  efficiency  in  large-scale  problems,  we  wish  to  avoid  solving  the 
necessary  and  sufficient  condition  for  optimality  in  terms  of  the  Isaacs  equation  and  instead 
solve  this  problem  using  a  direct  numerical  optimization  approach.  However,  the  Isaacs 
equation  addressed  an  important  challenge  in  the  formulation  of  the  solution:  it  combined 
the  inner  maximization  problem  in  (4.1)  in  the  Hamiltonian  condition  (3.11),  and  the 
resulting  inequality  removed  the  need  to  address  the  maximization  problem  directly.  If  we 
bypass  the  Isaacs  equation,  then  we  must  find  alternative  ways  to  address  the  maximization 
operator. 

To  accomplish  this,  we  rely  on  an  algebraic  manipulation  which  transforms  the  minimax 
optimization  problem  into  a  minimization  problem  with  an  inequality  constraint  bounded 
by  the  maximum  value  of  the  original  maximization  problem.  Given  the  objective  function 

J(x,u,w,9)  =  min  max  - 

v  '  ueu  wew  2 

6>e0 


rtf 


'to 


dt , 


(4.23) 
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let  m*  bound  the  maximum  value  attained  by  the  inner  maximization  problem,  i.e., 


(4.24) 


Effectively,  this  can  be  viewed  as  an  inequality  constraint 

[lk(*)l|2-72|K*)H2]  dt.  (4.25) 

As  a  direct  consequence,  we  can  rewrite  the  original  minimax  optimization  (4.1)  as  a 
minimization  problem 


J(x,  u ,  w)  6) 


(4.26) 


=  mm  m 

uGU 


subject  to  the  constraint  (4.25),  which  ensures  that  the  maximum  value  attained  by  the 
disturbance  player  w  is  captured  in  m*,  which  is  then  minimized  to  yield  the  solution  to 
the  minimizing  player  u.  In  this  form,  (4.25)  is  referred  to  as  an  isoperimetric  constraint, 
since  it’s  an  inequality  constraint  with  an  integral  term.  Isoperimetric  constraints  are 
common  in  the  optimal  control  literature,  most  famously  in  the  Problem  of  Queen  Dido 
[14].  An  isoperimetric  can  be  treated  as  any  other  inequality  constraint  and  we  will  derive 
the  necessary  conditions  for  optimality  in  the  subsequent  sections. 

Similar  to  direct  trajectory  optimization  for  optimal  control  problems,  we  aim  to  di¬ 
rectly  discretize  the  trajectories  in  our  robust  control  problem.  This  is  done  in  connection 
with  the  least-squares  function  approximation  framework  discussed  in  Chapter  2.  We  ap¬ 
proximate  the  state,  input,  and  disturbance  trajectories  with  polynomials  in  a  least-squares 
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sense,  i.e.,  for  every  trajectory  Xi,  ux ,  and  w, ,  there  exists  a  polynomial  that  satisfies 


|| Xi(t)  -  x*(t) ||2,g  =  min  || Xi(t)  -  Xi(t) \\2,q 

Xi  (t) 

I \ui(t)  -  u*(t) \\2,q  =  min  || Ui(t)  -  Ui(t) ||2(9 

Ui(t) 

\wi(t)  -  w*(t)\\2,q  =  min  || w^t)  -  w^t) ||2(„  (4.27) 

Wi(t) 


defined  on  the  weighted  £2-norm  of  a  function  xl(t) 


Xi(t)\\2,q  = 


(Xi(t))  q(t)dt)  , 


(4.28) 


over  some  weight  q(x).  The  trajectories  v*  (r)  and  w*  (r)  are  defined  similarly  and  we  only 
describe  the  state  trajectory  approximation  below  for  brevity. 

To  use  this  framework  in  approximate  trajectories,  we  first  discretize  the  time  index 
from  [to,tf]  to  [—1,1].  As  shown  in  (2.36),  this  can  be  accomplished  through  a  simple 
linear  transformation 

in  -  ir,  in  -  ir, 

(4.29) 


tf  —  to  tf  —  to 
t  =  -J—L - T  + 


2  2 

Let  (ti,  . . . ,  rn)  be  the  n  collocation  points  in  the  domain  [—1, 1],  The  state  trajectory 
x*  (r)  is  a  linear  combination  of  basis  functions  in  the  form 


N 


X, 


(T)  =  ^cA{r), 


(4.30) 


2=1 


with  basis  functions  Ojir)  and  coefficients  c% .  We  use  Legendre  polynomials  [100]  as  the 
basis  functions,  orthogonal  with  respect  to  q{r)  defined  on  the  inner  product 


/ 


<Mr)0i(rMr) dx  =  = 


1  i  =  j 
0  i  ±  j 


(4.31) 
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where  5ij  is  the  Kronecker  delta  function.  The  corresponding  coefficients  are  given  by 

c*  =  f  0i(r)x(r)g(r)  dr,  i  =  0,  (4.32) 

J  a 

which  can  be  solved  using  Gaussian  quadrature,  resulting  in  the  Legendre-Gauss-Lobatto 
(LGL)  collocation  points  defined  on  the  interval  r  e  [—1,1]  [98].  Given  a  set  of  trajectories 
Xi(r),  we  use  the  notation  CV  to  denote  a  matrix  of  coefficients,  where  each  row  contains 
coefficients  q  for  a  single  trajectory  under  N  collocation  points.  The  objective  of  the 
discrete  optimization  is  to  search  this  collection  of  coefficients  to  generate  trajectories  that 
satisfy  the  state  dynamics  and  associated  constraints,  i.e., 

min  iP(Cn),  (4.33) 


subject  to 


©(CV)  <  o,  r(Cjv)  —  o,  Cn  g  b, 


(4.34) 


where  Q(Cjv)  and  T(Cjv)  represents  concatenated  inequality  and  equality  constraints,  re¬ 
spectively,  and  B  =  {Cn  e  Mn|r  <  C n  <  s}  with  rL  e  [—00,00]  and  s*  e  [r*,oo].  Let 
S  =  {Cpj  e  B\Q(Cn)  <  0,  T((7jv)  =  0}  denote  the  feasible  region.  The  Lagrangian  is 
defined  as 

L(Cn,  A,  v)  =  iP(CN)  -  A tQ(Cn)  -  vTr(CN),  (4.35) 


where  A  and  v  denote  the  Lagrange  multipliers  for  the  constraints.  According  to  the 
Karush-Kuhn- Tucker  (KKT)  conditions  [132],  the  first  order  necessary  conditions  are 


0  £  A cnL(Cn,  A,  v )  +  Ab(C'iv) 

0  >  A  _L  Q(Cn)  <  0 

r  (CN)  =  0,  (4.36) 
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where  N^Cn)  =  {z  £  Rn|(r/  —  Cn)tz  <  0,  Vy  £  B  is  the  normal  cone  to  B  at  CV  In  the 
case  that  r,  or  s*  is  finite,  we  can  rewrite  as 


{( VcnL(Cn ,  A,  u))i  >0  Xi  =  Ti 

(VCnL(Cn,  A,  v))i  <0  Xi  =  Si  (4.37) 

( VCnL(Cn ,  A,  v))i  =  0  ri<Xi< 

A  mixed  complementarity  problem  [133]  is  defined  as  the  problem  of  finding  a  point 
zGl"  inside  the  box  B  =  {z\— oo  <  l  <  z  <  u  <  oo}  that  is  complementary  to  a  nonlinear 
function  F(z )  when 

{■ Fi(z )  >0  Zi  =  k 

Fi(z)  <0  Zi  =  Ui  (4.38) 

Fi  (^)  0  li  ^  Zi  Ui- 

When  l  =  —  oo  and  u  =  oo,  the  solution  is  given  by  simply  solving  a  system  of  nonlinear 
equations  so  that  F(z)  =  0.  If  l  =  0  and  u  =  oo,  the  resulting  problem  is  a  Nonlinear 
Complementarity  Problem  (NCP)  such  that 


Zi>  0 

Fi(z )  >  0 


ZiFi(z)  =  0. 


The  nonlinear  MCP  function  can  be  written  as  a  vector 


1 

<1 

£ 

r 

s 

O(Cjv) 

,i  = 

— oo 

,  u  — 

0 

.  r(c'iv)  _ 

— oo 

oo 

(4.39) 


(4.40) 
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where 


Vc„L(z)  =  VcniHCn)  -  A TVc„e(CV)  -  i>TVc„r(CV). 


(4.41) 


This  is  an  equivalent  condition  to  the  first  order  KKT  conditions  of  the  NLP,  which  can 
be  solved  by  a  MCP  solver  such  as  KNITRO  [134], 

4.4  Example  -  two-dimensional  uncertain  system 

In  this  section,  we  solve  an  example  'HOQ  robust  control  problem  under  the  differential 
game  formulation  using  trajectory  optimization.  Previously  in  Section  3.4,  we  considered 
the  robust  control  problem 


=  mm  max 

uEU  wGW 

eee 


(4.42) 


subject  to  the  dynamics 


x{t)  =  [f(x)  +  A f(x,  9,  t)]  +  5ri(x)w(t)  +  [g2(x)  +  A g2(x,  9 ,  t)]  u(t ) 
z(t)  =  [h(x),  u(t)]T 
x(t0)  =  x0, 


(4.43) 


with  A f(x,9,t)  and  Ag2(x,6,t)  in  parameterized  form 


A  f(x,9,t)  =  H2(x)F(x,9,t)Ei(x) 
Ag2(x,  9,  t )  =  g2(x)F(x,  9 ,  t)E2(x), 


(4.44) 
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and  |  \F(x,  6,  t)\ |2<  /?.  The  system  matrices  are  given  by 

0 

,  92\x)  =  , 

Ei(x)  =  x2(t),  E2(x)  =  1,  h(x)  =  x2(t),  F(x,6,t )  =  6(t)  (4.45) 

The  uncertain  term  9(t)  can  take  on  the  form  of  any  function  (time- varying  or  not),  as¬ 
suming  that  the  squared-norm  of  the  function  is  bounded  according  to  the  assumptions 
given.  An  example  is  9(t)  =  0.01  sin (4),  which  can  be  interpreted  as  a  small  perturbation 
added  to  the  nominal  system  parameter.  We  pursue  an  open- loop  solution  according  to  the 
direct  collocation  approach  described  in  Section  4.3  and  rewrite  the  optimization  objective 
according  to  the  complementarity  conditions  described.  The  resulting  problem  is  passed 
to  the  solver  KNITRO  with  trajectories  approximated  using  Legendre  polynomials. 

To  test  the  effectiveness  of  the  trajectories  against  bounded  disturbances,  we  implement 
a  receding- horizon  control  (RHC)  similar  to  the  setup  described  in  [105].  We  compute 
trajectories  for  the  system,  apply  the  obtained  control  inputs  for  a  fixed  duration,  then 
compute  new  trajectories  using  the  current  states  as  initial  conditions.  The  total  simulation 
time  is  10s  with  a  horizon  of  0.1s,  with  50  collocation  points  for  each  computed  trajectory. 
The  computation  time  for  each  trajectory  is  about  15s. 

Fig.  4.1(a)  gives  a  snapshot  of  the  RHC  at  the  beginning  of  a  horizon,  which  takes  in 
the  specified  initial  conditions  and  compute  trajectories  according  to  the  objective  function 
in  (4.42).  Compared  to  the  known  global  solution  from  Fig.  3.3(a),  the  state  trajectories 
from  Fig.  4.1(a)  exhibit  a  slightly  longer  settling  time.  This  is  a  direct  result  of  the  local 
nature  of  the  solution. 

One  advantage  of  the  trajectory  optimization  approach  is  the  ability  to  consider  con- 


f(x)  = 

H2(x)  = 


x2(t) 

~  \xl{t)~X1  (t) 

3  -  x\ (t) 


9i(x)  = 


0 

x2(t) 
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(a)  Unconstrained  input 


(b)  Constrained  input  to  ±0.5 


Figure  4.1:  Optimized  trajectories  for  a  two-dimensional  system  under  disturbance.  Given 
constraints  on  the  control  input,  stabilizing  trajectories  can  still  be  obtained. 


straints  on  the  states  and  control  inputs.  In  Fig.  4.1(b),  we  simulate  a  scenario  where  the 
control  input  was  constrained  to  be  ±0.5.  The  resulting  control  trajectory  briefly  saturated 
at  ±0.5,  but  was  still  able  to  stabilize  the  system. 

Fig.  4.2  shows  a  simulation  of  the  RHC  where  the  system  is  disturbed  using  the  optimal 
disturbance  computed  in  (3.58).  Under  the  optimal  disturbance,  the  settling  time  for 
the  states  are  longer.  Although  the  computed  trajectories  only  estimated  suboptimal 
disturbances,  in  a  closed-loop  setting,  the  RHC  was  able  to  stabilize  the  system  with  a 
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V  2  4  6  8  10 

t 


(a)  Closed-loop  response  with  optimal  disturbance,  (b)  Closed-loop  response  with  sinusoidal  distur¬ 
bance. 

Figure  4.2:  Receding-horizon  control  of  a  two-dimensional  uncertain  system.  The  closed- 
loop  response  is  stable  for  both  optimal  and  sinusoidal  disturbances. 


performance  close  to  the  globally  optimal  solution  in  Fig.  3.3(a). 

In  direct  trajectory  optimization,  the  initial  guess  often  plays  an  important  role  in 
determining  the  quality  of  the  resulting  solutions.  In  this  two-dimensional  example,  random 
initial  guesses  were  provided  to  the  solver  and  the  resulting  trajectories  were  feasible  though 
only  locally  optimal.  In  Chapter  6,  we  will  analyze  the  computational  performance  of  this 
robust  trajectory  optimizer  in  the  context  of  a  more  complex  dynamic  system  and  provide 
a  detailed  discussion  on  the  quality  of  solutions  and  computation  time.  In  addition,  we  will 
also  provide  comparisons  to  alternative  methods,  such  as  LQR  and  non- robust  trajectory 
optimization  in  a  receding-horizon  control  framework. 
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4.5  Summary 

In  this  chapter,  we  extended  direct  trajectory  optimization  to  nonlinear  robust  'HO0  control 
under  bounded  external  disturbances  and  parametric  uncertainties.  We  solved  the  result¬ 
ing  differential  game  with  open- loop  trajectories  obtained  in  accordance  with  necessary 
conditions  for  optimality.  The  proposed  framework  transformed  a  minimax  optimization 
problem  into  a  minimization  problem  with  complementarity  conditions.  The  resulting 
problem  was  solved  by  direct  collocation  pseudospectral  methods,  which  transcribed  a 
continuous-time  problem  into  an  equivalent  discrete  form  by  parameterizing  the  state  and 
control  spaces  using  global  polynomials  and  collocating  the  differential-algebraic  equations 
using  nodes  obtained  from  a  Gaussian  quadrature.  We  applied  the  open-loop  trajecto¬ 
ries  in  a  receding-horizon  control  setting,  which  was  shown  to  be  effective  on  a  simple 
two-dimensional  benchmark  problem. 

To  our  knowledge,  this  is  is  the  first  application  of  direct  trajectory  optimization  to 
the  robust  'HO0  control  problem.  Compared  to  a  naive  nonlinear  programming  approach, 
the  proposed  method  was  able  to  generate  trajectories  orders  of  magnitude  faster,  which 
makes  it  suitable  for  large-scale  systems.  The  trade-off  is  the  quality  of  the  solutions 
obtained:  for  computing  the  trajectories,  we  can  only  guarantee  that  it  satisfies  a  set 
of  first  order  optimality  conditions,  and  not  the  overall  stability  of  the  control  design. 
In  the  next  chapter,  we  extend  this  trajectory  optimization  approach  to  consider  multiple 
models,  which  removes  some  of  the  restrictive  assumptions  made  on  the  forms  of  the  system 
uncertainties. 
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5.1  Motivation 

This  chapter  proposes  a  new  method  for  solving  the  robust  T-Loo  control  problem  presented 
in  Section  1.3.  In  Chapter  3  and  Chapter  4,  we  developed  global  and  local  methods 
for  solving  the  robust  'HO0  control  problem  under  a  differential  game  formulation  with 
strong  assumptions  on  the  form  of  the  system  uncertainties.  We  modeled  the  external 
disturbance  as  a  perturbation  player,  aiming  to  maximize  the  disturbance  under  parametric 
uncertainties,  which  are  structured  with  bounded  norms.  In  reality,  the  structured  and 
additive  nature  of  these  modeled  uncertainties  may  not  be  able  to  account  for  all  forms  of 
uncertainties  with  potential  impacts  on  the  performance  of  the  system. 

As  discussed  in  Section  1.3.2,  unmodeled  dynamics  is  a  phenomenon  where  the  effects 
of  higher-order  dynamics  (such  as  actuator  dynamics)  have  a  significant  impact  on  the 
response  of  the  system.  To  compensate  for  unmodeled  dynamics  is  a  difficult  problem 
that  requires  a  more  flexible  representation  of  uncertainties  in  the  system  dynamics  than 
what  we  have  previously  proposed.  In  this  chapter,  we  develop  a  heuristic  approach  to 
robust  control  design  based  on  the  concept  of  multiple  models,  reviewed  in  Section  1.2.7. 
A  multi-model  design  evaluates  the  performance  of  a  given  control  law  over  a  distribution 
of  possible  system  dynamics,  which  allow  the  optimization  to  consider  the  expected  cost 
over  a  range  of  possible  outcomes  and  account  for  the  maximum  cost  for  the  worst-case 
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scenario.  We  consider  this  multi-model  design  in  the  context  of  differential  games  that 
we  have  developed  in  the  previous  chapters,  with  the  added  assumption  that  both  players 
now  consider  a  distribution  of  models  in  formulating  their  respective  strategies.  This  is  a 
philosophy  consistent  with  the  'H0 0  designs  proposed  in  the  previous  chapters. 

The  consequence  of  the  proposed  approach  is  a  method  that  is  able  to  address  all  three 
forms  of  uncertainties  discussed  in  Section  1.3.2:  external  disturbances,  parametric  uncer¬ 
tainties,  and  unmodeled  dynamics.  The  proposed  method  is  scalable  to  high-dimensional 
systems  by  building  on  the  robust  trajectory  optimization  techniques  developed  in  Chap¬ 
ter  4,  which  we  extend  to  compute  trajectories  that  simultaneously  satisfy  optimality 
conditions  for  multiple  models.  The  resulting  open-loop  trajectories  can  be  utilized  in  a 
receding-horizon  setting  to  enable  closed-loop  feedback  control.  We  illustrate  the  proposed 
approach  using  a  simple  two-dimensional  system  and  discuss  scalability  to  more  complex 
applications. 


5.2  Formulation 

In  the  previous  chapters,  we  formulated  a  robust  control  problem  with  the  nonlinear 
dynamics 


x(t)  =  f(x)  +  Af(x,6,t)  +  gi(x)w  +  g2(x)  +  Ag2(x,6,t) 
z[t)  =  [h(x),  u]T 
x(t0)  =  x0, 


u 


(5.1) 


with  x  £  X  as  the  state  vector,  z  £  2  is  the  output  vector,  u  £  IA  and  w  £  W  are  the 
minimizing  and  maximizing  players,  respectively,  xo  is  a  vector  of  initial  states ,  and  the 
matrices  f(x),  g(x),  and  h(x)  are  the  system  matrices  that  are  smooth  vector  fields  not 
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explicitly  parameterized  by  time.  The  parameter  9  £  0  is  the  uncertain  system  parameter 
which  belongs  to  the  set 

©  =  {0|O  <9<6U}  (5.2) 

that  is  bounded  above  by  9U.  Parametrized  by  9,  the  functions  A /  and  A g2  are  uncertain 
functions  belonging  to  the  set  of  admissible  uncertainties ,  defined  by 

Af(x,9,t)  =  H2(x)F(x,9,t)Ei(x) 

A g2(x, 9 ,  t)  =  g2(x)F(x,  9,  t)E2(x ),  (5.3) 

for  ||F(x,  6*,  t)||2<  (3,  which  intuitively,  assumes  that  the  uncertainties  are  bounded  by  a 
sphere.  Using  these  assumptions,  we  are  able  to  derive  necessary  and  sufficient  conditions 
for  optimality  in  terms  of  Isaacs’  equation  (3.21)  and  explicit  feedback  control  laws  (3.19) 
using  the  value  function.  In  an  open-loop  setting,  this  model  can  be  solved  using  both 
indirect  and  direct  trajectory  optimization,  described  in  Section  4.2  and  Section  4.3. 

In  this  chapter,  we  formulate  robust  control  problems  which  cannot  be  modeled  by 
(5.1).  In  particular,  we  aim  to  remove  the  restrictive  assumptions  on  the  structure  of 
the  uncertainties  in  (5.3).  It  is  clear  not  all  parametric  uncertainties  can  be  posed  in  an 
additive  way  to  the  dynamics,  as  represented  in  (5.1).  As  an  example,  consider  the  linear 
optimal  control  problem  posed  by  Atkeson  [127],  with  double-integrator  dynamics 

x  =  Ax  +  Bu ,  (5.4) 

where  the  system  matrices  are  given  by 


A  = 

1  T 

,  B  = 

T2/ 2 

0  1 

T 

Jiuguang  Wang 


99 


Numerical  Nonlinear  Robust  Control  with  Applications  to  Humanoid  Robots 


with  T  =  0.001s.  The  objective  function  is 


J  =  min  - 

u&U  2 


1  ftf 


xtQx  +  utRu )  dt, 


'to 


(5.6) 


with 


Q  = 


1000 

0 


R  =  0.001. 


(5.7) 


In  addition  to  this  nominal  model,  unmodeled  dynamics  include  a  second-order  low  pass 
filter  with  a  cutoff  frequency  of  10Hz,  under  the  dynamics 


a  M 


U1 


s 2  +  2y  ujs  +  oj 2 


(5.8) 


with  a  damping  ratio  7  =  1  and  a  natural  frequency  u  =  207T.  There  is  no  resonant 
peak  and  the  unmodeled  dynamics  acts  as  a  well-behaved  low  pass  filter.  As  shown  by 
Atkeson,  when  an  LQR  design  is  carried  out  using  the  nominal  model  without  considering 
the  unmodeled  dynamics,  the  resulting  controller  failed  to  stabilize  the  overall  system.  A 
multi- model  design  for  this  system  utilizes  two  models:  a  nominal  model  and  an  augmented 
model  with  an  input  filer  with  7=1  and  uj  =  107i\  The  resulting  controller  is  then  able  to 
stabilize  both  the  nominal  model  and  the  true  system  model.  Fig.  5.1  shows  the  closed-loop 
response  for  the  LQR  design  and  multi-model  design  for  both  the  nominal  model  and  true 
system  model. 

This  example  motivates  our  use  of  multiple  models  in  control  design.  By  evaluating 
the  performance  of  a  given  control  law  over  a  distribution  of  possible  system  dynamics,  we 
allow  the  optimization  to  consider  the  expected  cost  over  a  range  of  possible  outcomes  and 
account  for  the  maximum  cost  for  the  worst-case  scenario.  This  framework  removes  the 
need  to  to  formulate  uncertainties  in  the  form  (5.1)  and  allows  flexibility  in  dealing  with 
uncertainties  both  in  the  parameters  of  the  dynamics  and  in  the  model  structure. 
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(a)  Nominal  model  (b)  True  model  with  unmodeled  dynamics 

Figure  5.1:  Comparison  of  LQR  and  multi-model  control.  LQR  control  is  stable  only  for 
the  nominal  model,  while  a  multi-model  design  is  stable  for  both  the  nominal  and  the  true 
model. 

We  propose  to  extend  the  multi-model  framework  in  Atkeson  [127]  to  nonlinear  robust 
Hoo  control  under  a  differential  game  formulation.  In  this  setting,  both  players  now  con¬ 
sider  a  distribution  of  possible  system  dynamics  in  formulating  their  respective  strategies, 
with  the  ability  to  compensate  or  exploit  model  uncertainties  to  optimize  their  respective 
objective  functions.  The  resulting  equilibrium  is  a  robust  Nash  equilibrium ,  a  pair  of  strate¬ 
gies  (u*,  w*)  secured  against  any  attempt  by  one  player  to  unilaterally  change  his  strategy, 
considering  the  underlying  uncertainty. 
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The  system  dynamics  we  use  in  this  multi-model  design  is  given  by 

x(t)  =  f(x)  +  gi(x)w  +  g2(x)u 
z(t )  =  [h(x),u]T 

x(t0)  =  x0.  (5.9) 

This  is  a  simplihed  version  of  (5.1)  that  is  a  still  an  'Hoa  formulation  with  an  external 
disturbance  term  w,  but  without  modeling  the  parametric  uncertainties.  We  make  the 
same  assumptions  about  the  boundedness  of  the  disturbance,  that  is,  the  £2- gain 

ftf  rlf 

/  \\z{t)\\2  dt  <  /  \\w(t)\\2  dt  +  n(xo)i  \/tf  >  to,  (5.10) 

j  to  Jto 

is  bounded  by  7.  To  increase  robustness  against  uncertainties,  we  instead  define  a  family 
of  such  models. 

Definition  5.1.  (Multi- model  system  dynamics).  The  dynamics  of  the  multi-model  system 
is  given  by  a  set  of  first-order  differential  equations 

xa  =  fa(xa )  +  gi(xa)w  +  g%(xa)u 
za  =  [ha(xa),u]T 

xa(t0)  =  Xq,  (5.11) 

where  a  e  A  belong  to  a  finite  parametric  set  A.  Here,  xa  is  the  indexed  state  vector,  u  is 
the  vector  of  shared  controlled  inputs  to  the  set  of  systems,  and  w  is  the  vector  of  shared 
external  disturbances. 

Each  a  represents  a  particular  model  from  the  parametric  set  with  its  own  system  matrices, 
which  effectively  increases  the  range  of  modeling  options  without  requiring  the  uncertainty 
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terms  to  conform  to  (5.1).  For  the  purposes  of  being  consistent  with  previous  chapters,  we 
restrict  the  dynamics  to  be  affine  with  respect  to  the  control  inputs. 

We  also  augment  the  objective  function  based  on  this  distribution  of  models. 

Definition  5.2.  (Multi-model  objective  function).  The  multi-model  optimization  objec¬ 
tive  takes  the  form 


(5.12) 


given  a  scalar  7  and  a  time  horizon  tf  >  to,  over  a  distribution  of  models  a  e  A.  We 
assume  that  x(t)  =  0  is  an  unique  equilibrium  point  of  the  system  with  u(t)  =  0  and 
w(t)  =  0. 

Here,  we  minimize  the  worst-case  cost  of  the  family  of  dynamic  models  under  disturbance. 
Clearly,  this  remains  a  minimax  optimization  problem,  but  now  the  control  input  u  must 
stabilize  a  family  of  models,  from  which  the  worst-case  cost  is  derived  from. 

5.3  Multi- model  dynamic  programming 

Let  us  first  consider  the  interpretation  of  the  optimization  objective  in  (5.12)  from  the  view 
of  the  dynamic  programming  principle.  Similar  to  the  definition  of  the  value  function  in 
(3.5),  we  can  define  a  value  function  for  cost  (5.12)  associated  with  each  model  in  (5.11). 

Definition  5.3.  (Multi-model  Bellman  value  function).  The  multi-model  value  function 
V(xa,t)  defines  the  cost-to-go  from  any  initial  state  xa  and  any  time  t  such  that 


ml  sup 


(5.13) 
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In  other  words,  the  value  function  bounds  the  cost  and  provides  the  optimal  cost-to-go  for 
any  state  xa(t).  Given  a  family  of  such  value  functions  Va(xa,t),  we  can  generate  control 
laws  for  the  two  players  by  accounting  for  the  corresponding  state  at  each  individual  value 
function.  The  control  laws  take  the  form 

U0  =  argmm||yl|(||Z‘>(f)||2-72|Kt)||2)+gV'“(i“,i)| 

«-(*)  =  arg  max  j  |.4|  (||*“(t)||2-r,|H0ll2)  +  E  f>  j  •  (5-14) 

where  \A\  is  the  cardinality  of  A.  This  is  the  approach  taken  in  Whitman  [125]  for  multi¬ 
model  optimal  control  problems  and  we  have  extended  the  definition  of  the  value  function 
to  a  two-player  setting.  Given  that  we  have  changed  the  system  dynamics  (5.1)  into 
form  (5.11),  the  necessary  and  sufficient  conditions  we  have  derived  in  terms  of  the  Isaacs 
equations  (3.21)  no  longer  applies  to  this  system.  Thus,  we  must  first  derive  a  new  Isaacs 
equation  specific  to  the  system  dynamics  in  (5.11). 

We  first  define  the  equilibrium  strategies  in  the  context  of  (5.11).  For  each  model 
a  G  A,  the  interpretation  of  the  game  from  the  perspective  of  both  players  are  given  by 
the  objective  functions 


1 

rtf  r 

,  u ,  w)  —  min  - 
ueu  2  ^ 

u 

\\nm2-i2\ 

Ht)  ll2_ 

dt 

(5.15) 

1 

rtf  [ 

J2(t,Xa. 

,  u,  w)  —  max  - 

/ 

IK*)II2' 

dt. 

(5.16) 

wew  2 

JtQ  1 

These  opposing  objectives  are  related  by 

Ji(t,  xa ,u,w )  =  —J2(t,xa,u,w),  (5-17) 
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which  remains  a  noncooperative  zero-sum  game.  A  pair  of  strategies  ( u*,w *)  forms  a  robust 
Nash  equilibrium  if 


Ji(t,  xa,u*,w*)  <  Ji(t,xa,u,  w*),  Vu  £U,a  £  A  (5.18) 

J2(t,xa,u* ,w*)  >  J2(t,xa,u* ,w),  Mw  £  W,  a  £  A.  (5.19) 

Equivalently,  the  saddle-point  condition  indicates  that 

J(t,  xa ,u* ,w)  <  J(t,  xa,u* ,w*)  <  J(t,  xa,u,w*),  \/w  £W,u  £U,a  £  A.  (5.20) 


We  define  the  Bellman  value  function  as  a  saddle-point  equilibrium  solution  that  satisfies 

(5.21) 


V(x“,f)  =  inf  sup  1  f  [||z“(t)||2-72|Mt)||2]  dr 
w£W,a£A  ^  Jt 


and  we  use  the  notation 


u>(*v)  =  WW),  vt^,t)  =  dV^ 


dxc 


dt 


(5.22) 


to  denote  the  row  vectors  of  first  partial  derivatives  of  the  value  function  with  respect  to  t 
and  x,  respectively.  For  brevity,  we  drop  the  time  index  t  when  it  is  immaterial.  We  note 
that  the  value  function  bounds  the  cost  function 


Va(xa(tf),tf)  <  J*(t,xa,u*,w*)  <  Va(t0,XQ),  (5.23) 

where  u*  and  w*  are  the  optimal  controls  of  the  system  (w*  is  optimal  in  the  sense  that  it 
provides  the  worst-case  disturbance  to  the  system). 

Similar  to  Section  3.2,  we  use  basic  optimization  theory  to  derive  the  optimal  control 
and  the  associated  necessary  and  sufficient  condition  for  optimality.  First,  we  convert 
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the  constrained  optimization  problem  in  (5.12)  to  an  unconstrained  form  by  defining  the 
appropriate  Hamiltonian. 

Definition  5.4.  (Multi-model  Hamiltonian).  The  multi-model  Hamiltonian  H(xa,  (VX)T ,  u, 
for  the  system  dynamics  in  (5.11)  and  objective  function  (5.12)  is 

H(xa,(V*)T,u,w)  =  {V*)T(j(yx)  +  g!(x)w  +  g2(x)u)  +^\\za\\2-^2\\w(t)\\2,  (5.24) 

where  (VX)T  is  the  costate  for  each  model  a  G  A. 

A  necessary  condition  for  optimality  is  Isaacs’  condition,  which  states  that 

inf  sup  H(xa,  (VX)T ,u,w)  =  sup  inf  H(xa,  (VX)T,  u,  w),  (5.25) 

U&A  W£.y\)  W£\tyueU 

Equivalently,  this  can  be  expressed  as 

H(xa,(Vxa)T,u*,w)  <  H(xa,(Vxa)T,u*,w*)  <  H(xa,(Vxaf,u,w*).  (5.26) 


For  the  Hamiltonian  to  be  minimized  over  U,  the  set  of  all  possible  controls,  we  write 

minsupi/(a:",  (Vx)T,u,w)  =  0.  (5.27) 

u  w 

The  necessary  condition  for  optimality  is  written  as  the  partial  derivatives  of  the  Hamil¬ 
tonian  with  respect  to  the  control  inputs 

8H  ,  , 

sT“’”')=0 

r)  H 

—  (u,w*)  =  0.  (5.28) 
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These  conditions  give  the  control  inputs  with  respect  to  the  costate  variable  as 

w’(*’K)  =  ^sI(x)K)-  (5.29) 

Substituting  back,  we  obtain 

Theorem  5.5.  (Necessary  and  sufficient  condition  for  optimality).  The  multi-model 
Hamilton-Jacobi-Isaacs  equation 


Vta(xa,t )  +  V«(xa,t)fa(xa ) 


1 


GtffM  -  ga2{xa){g«yXxa) 


.7 


(U“)T(*“,i)  +  iw)TA?(x“)  =  0, 

(5.30) 


is  a  necessary  and  sufficient  condition  for  optimality  for  the  multi-model  optimization 
problem  (5.12),  subject  to  the  system  dynamics  (5.11). 


We  omit  the  proof  here  since  by  construction,  it  satisfies  the  first-order  necessary  conditions 
(in  (5.27))  and  the  sufficiency  condition  can  be  proven  the  same  way  as  the  single-model 
version  of  the  Isaacs  equation  given  in  (3.21). 

In  the  context  of  the  multi- model  optimization  problem  in  (5.12),  we  have  effectively 
reduced  the  problem  to  solving  a  series  of  Isaacs  equations  in  (5.30),  then  computing  the 
optimal  feedback  control  laws  in  (5.14).  In  each  case,  the  value  function  approximation 
approach  described  in  Section  3.3  is  fully  applicable.  Each  value  function  can  be  represented 
in  the  form 

n 

va(xa )  =  ^ ~^Wi4)i(xa ),  (5.31) 

2=1 
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and  the  feedback  control  laws  are  given  by 


w  = 


u  = 


(5.32) 


This  is  philospically  similar  to  the  approach  presented  by  Whitman  [125].  While  this 
is  feasible  for  simple  systems  with  reduced  dimensionality,  it  is  far  too  impractical  for 
large-scale  systems.  Instead  of  solving  a  series  of  these  expensive  optimization  problems, 
we  instead  consider  a  new  approach  using  trajectory  optimization,  which  solves  a  single 
optimization  problem  in  computing  open-loop  trajectories  to  (5.12). 

5.4  Multi- model  trajectory  optimization 

In  Chapter  4,  we  presented  a  trajectory  optimization  approach  for  the  nonlinear  'Hoa  control 
problem  under  the  differential  game  formulation,  where  we  computed  open-loop  trajectories 
that  satisfied  certain  optimality  conditions.  In  the  context  of  a  two-player  differential  game, 
the  notation  of  open-loop  solution  entails  that  both  players  formulate  their  strategy  at  the 
moment  the  system  starts  to  evolve,  based  on  information  available  at  the  time:  the  system 
dynamics,  the  objective  function,  and  initial  conditions.  This  strategy  cannot  be  changed 
once  the  system  evolves  and  the  saddle-point  solution  is  the  combination  of  strategies  of 
both  players  which  are  secured  against  any  attempt  by  one  player  to  unilaterally  change 
his  strategy.  We  can  extend  this  notion  to  the  multi-model  design,  where  both  players  now 
formulate  open- loop  solutions  considering  a  distribution  of  possible  system  dynamics. 
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Consider  again  the  distribution  of  dynamic  models  in  (5.11)  where 

xa  =  fa(xa )  +  g?(xa)w  +  g%(xa)u 

za  = 

xa(t0)  =  x%.  (5.33) 

To  formulate  a  trajectory  optimization  approach,  we  first  concatenate  the  state  vectors  xa 
and  za  for  the  individual  models  into  a  single  vector  in  the  form 

xa  =  (xa\. . .  ,xa\A^,  te[t0,tf] 

Za  =  (V*1,. . .  ,za^y  te[t0,tf]  (5.34) 

and  a  concatenated  vector  held  as  a  function 

Fa(X,  u,  w )  =  (fai’u(xa,  u,w),...,  fa^’u(xa,u,  w)  j .  (5.35) 

We  note  that  while  the  states  and  the  associated  dynamics  have  been  combined  to  form 
a  large  system,  the  control  inputs  remain  the  same.  In  other  words,  there  is  a  single 
control  for  the  minimizing  player  u  and  a  single  control  for  the  maximizing  player  w  with 
simultaneously  affects  all  models  in  A.  The  objective  function  given  this  concatenated 
dynamics  is 

1  rtf 

J(u,  w)  =  minmax  -  /  ri|ZQ,(t)||2— 72||tt;(f)||2l  dt ,  (5.36) 

u&u  migw  2  Jt  L  J 

subject  to  the  dynamics 

Xa  =  Fa(X,u,w )  (5.37) 

X(t0)  =  X0,  (5.38) 
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where  Xq  is  the  vector  of  initial  states  for  the  concatenated  states.  We  formulate  an 
optimization  problem  to  solve  for  u  and  w  that  are  open-loop  solutions  that  are  functions 
of  only  time.  In  practical  implementation,  the  obtained  trajectories  can  be  used  without 
considering  the  concatenated  state  vector. 

We  formulate  the  trajectory  optimization  similarly  to  the  approach  described  in  Chap¬ 
ter  4  where  we  seek  to  transcribe  a  continuous  time  robust  control  problem  into  an  equiv¬ 
alent  mixed  complementarity  problem.  Following  the  notation  defined  in  Chapter  4,  we 
redefine  the  independent  variable  to  be  r  :  [to,t/],  which  will  be  discretized  to  obtain  the 
MCP.  Rewriting  the  original  objective  function,  we  obtain 

J  =  $(X(r0),  Xa(Tf),T0,Tf)  +  f  C  (Xa(r) ,  u(t)  ,  w{t))  dr,  (5.39) 

J  To 

where  <F(Aq,(to),  Aa(r/),  To,  Tf)  is  the  terminal  cost  and  (7(Xq(t),  m(t),  w(t))  is  the  one 
step  cost  in  terms  of  the  concatenated  state  vector  Xa,  while  all  other  variables  follow 
their  original  definitions.  The  dynamics  in  this  discrete  form  is 

Xa(r)  =  Fa(Xa(T),u(T),w(r)),  (5.40) 

with  the  initial  condition  and  final  condition  are  expressed  as  boundary  constraints 

e(Xa(To),Xa(Tf),To,Tf)  =  0.  (5.41) 

We  can  also  handle  a  mixture  of  constraints  on  the  state  and  control  variables,  in  the  form 
of  an  inequality 

s(Xa(r),  u(t),  w(t))  <  0,  (5.42) 

which  is  stacked  to  form  the  overall  constraint  set  for  the  concatenated  system.  In  this 
form,  we  can  proceed  with  the  trajectory  optimizer  described  in  Section  4.3. 
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5.5  Example  -  two-dimensional  multi- model  design 

In  the  previous  section,  we  assumed  the  system  uncertainty  to  be  a  parametric  time- varying 
function,  which  limits  the  form  of  uncertainty  that  can  be  modeled.  Following  the  method 
described  in  Section  5.4,  we  augment  the  nominal  model  of  the  system 


X\  =  £2 


is  1  3 

£2  =  —  -£2  —  ~£2  —  Xl  +  X2W  +  U, 
A  a 


(5.43) 


by  an  additional  two  models 


£3  =  £4 


£4  =  -  (  -  +  9i(t) 


x\  -  -£4  -  £3  +  x4w  +  (1  +  6>i  (t))  u 


(5.44) 


and 


£5  =  £6 

X6  =  ~  Q  +02(*)^  X%  ~  ^X% 


£5  +  XqW  +  (1  +  02(t))  u 


we  assume  that 


(5.45) 


9\{t)  =  0.5  sin(t  +  5) 

02{t)  =  0.5  sin(t  —  5).  (5.46) 

These  are  two  different  forms  of  parametric  uncertainties,  in  addition  to  the  nominal 
model,  which  forms  a  six-state  system,  from  which  we  solve  for  a  single  control  input  u 
and  a  disturbance  input  w  that  are  applied  to  all  three  models.  To  test  the  effectiveness 
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of  the  trajectories  against  bounded  disturbances,  we  implement  a  receding-horizon  control 
(RHC)  similar  to  the  setup  described  in  Chapter  4.  The  total  simulation  time  is  10s  with  a 
horizon  of  0.1s,  with  50  collocation  points  for  each  computed  trajectory.  The  computation 
time  for  each  trajectory  is  about  25s. 

A  snapshot  of  the  trajectories  at  the  beginning  of  the  horizon  is  shown  in  Fig.  5.2.  The 
Xi  and  x2  trajectories  for  all  three  models  are  compared  in  Fig.  5.2(a),  and  we  see  that 
there  are  visible  differences  in  the  three  models.  The  input  and  disturbance  trajectories 
are  shown  in  Fig.  5.2(b),  which  the  three  models  share.  Similar  to  the  results  in  Chapter  4, 
we  can  also  apply  constraints  on  the  control  inputs.  Fig.  5.3(b)  shows  control  trajectories 
generated,  with  saturations  at  ±  =  0.5.  We  see  that  despite  the  saturation,  the  resulting 
trajectories  are  stable. 

So  far,  this  three-model  system  represents  system  parametric  uncertainty  similar  to  the 
results  obtained  in  Chapter  4.  With  a  multi-model  design,  we  can  also  address  potential 
unmodeled  dynamics.  Consider  the  example  raised  in  (5.8),  where  a  second-order  low  pass 
filter  with  a  cutoff  frequency  of  10Hz  is  applied  to  the  control  input  u.  In  state  space  form, 
this  low  pass  filter  is  equivalent  to  a  linear  system 


if  =  Afxf  +  Bfu 
uf  =  Cfxf  +  Dfu > 


(5.47) 


where  the  input  u  is  low-pass  filtered  to  produce  the  new  input  Uf.  The  system  matrices 
are  given  by 


1.9112  -0.9150 


1 


1 


0 


0 


Cf=  0.0037,0.0001 


Dj  =  9.4469E-04. 


(5.48) 
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(a)  State  trajectories  (b)  Input  and  disturbance  trajectories 

Figure  5.2:  Optimized  trajectories  for  a  multi-model  two-dimensional  system  under  dis¬ 
turbance.  Two  models  with  varying  parameters  are  provided  in  addition  to  the  nominal 
model  under  identical  disturbance,  and  a  single  control  input  stabilizes  all  three  systems. 

The  overall  system  is  given  by  two  model.  First,  the  nominal  model 


X\  =  x2 


1  5  1  3 

x2  =  —  -x%  —  -X%  —  X\  +  x2w  +  u 


(5.49) 
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(a)  State  trajectories  (b)  Input  and  disturbance  trajectories 

Figure  5.3:  Optimized  trajectories  for  a  multi-model  two-dimensional  system  under  distur¬ 
bance  with  control  input  constraints.  Two  models  with  varying  parameters  are  provided 
in  addition  to  the  nominal  model  under  identical  disturbance,  and  a  single  constrained 
control  input  (±0.5)  stabilizes  all  three  systems. 

and  the  additional  model 


£3  =  £4 

£4  =  —  -£2  —  -£2  —  £1  +  £2  w  +  ( CfXf  +  DfU ) 

Xf  =  AfXf  +  BfU.  (5.50) 

In  total,  the  concatenated  state  space  of  the  system  consists  of  six  states  -  two  for  the 
nominal  model  and  four  for  the  additional  model,  which  includes  two  states  for  the  filter. 

We  can  now  apply  this  new  multi-model  design  in  a  receding-horizon  setting  with  the 
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(a)  State  trajectories  (b)  Input  and  disturbance  trajectories 

Figure  5.4:  Optimized  trajectories  for  a  multi-model  two-dimensional  system  under  dis¬ 
turbance.  In  addition  to  the  nominal  model  for  the  system  dynamics,  an  additional  model 
accounts  for  unmodeled  dynamics  in  the  form  of  a  low-pass  filter.  A  single  control  trajec¬ 
tory  stabilizes  both  systems  under  identical  disturbance. 

same  parameter  as  previously  described.  Fig.  5.4  shows  a  snapshot  of  the  trajectories 
obtained  for  this  system  in  the  beginning  of  a  horizon. 

We  can  now  examine  the  performance  of  the  two  RHC  setups  described.  The  first  RHC 
setup  consists  of  three  models  which  only  model  parametric  uncertainty,  described  in  (5.46). 
The  second  RHC  setup  consists  of  two  models,  which  accounts  for  unmodeled  dynamics  in 
the  form  of  a  low-pass  filter,  described  in  (5.47).  We  test  the  RHC  using  both  the  nominal 
model  and  the  true  model  which  includes  the  unmodeled  dynamics.  In  both  cases,  we 
assume  the  actual  parametric  uncertainty  to  be  a  sinusoidal  signal  9{t)  =  O.Olsin(t). 

Fig.  5.5  shows  a  simulation  of  the  RHC  where  the  system  is  disturbed  using  the  optimal 
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(a)  Nominal  model  (b)  True  model  with  unmodeled  dynamics 

Figure  5.5:  Comparison  of  two  robust  RHC  setups.  RHC  using  three  models,  accounting 
only  for  parametric  uncertainty,  is  stable  only  for  the  nominal  model.  A  two-model  design 
accounting  for  unmodeled  dynamics  is  stable  for  both  the  nominal  and  the  true  model. 


disturbance  computed  in  (3.58).  As  shown  in  Fig.  5.5(a),  the  first  RHC  with  three  models 
exhibits  better  performance  with  respect  to  the  nominal  model,  while  the  second  RHC 
is  overly  conservative,  leading  to  a  longer  settling  time  for  both  states.  However,  with 
respect  to  the  true  model,  Fig.  5.5(b)  shows  that  the  three-model  RHC  failed  to  stabilize 
the  system. 

In  Chapter  6,  we  will  analyze  the  computational  performance  of  this  multi- model  tra¬ 
jectory  optimizer  in  the  context  of  a  more  complex  dynamic  system  and  provide  a  detailed 
discussion  on  the  quality  of  solutions  and  computation  time.  In  addition,  we  will  also 
provide  comparisons  to  alternative  methods,  such  as  LQR  and  non-robust  trajectory  opti- 
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mization  in  a  receding-horizon  control  framework. 


5.6  Summary 

In  this  chapter,  we  developed  a  heuristic  approach  to  robust  control  design  based  on  the 
concept  of  multiple  models,  which  evaluated  the  performance  of  a  given  control  law  over  a 
distribution  of  possible  system  dynamics.  A  distribution  of  models  allowed  the  optimiza¬ 
tion  to  consider  the  expected  cost  over  a  range  of  possible  outcomes  and  account  for  the 
maximum  cost  for  the  worst-case  scenario.  We  improved  on  the  approaches  presented  in 
the  previous  chapters  by  removing  restrictive  assumptions  made  on  the  forms  of  uncertain¬ 
ties,  which  enabled  the  approach  to  compensate  for  effects  such  as  unmodeled  dynamics. 
We  developed  these  results  in  the  context  of  necessary  sufficient  conditions  for  individual 
value  functions  similar  to  Chapter  3  and  utilized  the  robust  trajectory  optimization  ap¬ 
proach  in  Chapter  4  to  compute  open-loop  trajectories  that  simultaneously  satisfied  the 
optimality  conditions  for  multiple  models.  We  demonstrated  the  use  of  these  trajectories 
in  a  receding-horizon  setting  for  a  simple  two-state  system. 

The  proposed  multi-model  design  is  a  powerful  approach  to  differential  games.  Where 
we  have  proposed  a  distribution  of  system  dynamics,  the  approach  can  be  easily  extended  to 
scenarios  where  a  distribution  of  different  objective  functions  and  information  structures 
are  given  to  the  players,  resulting  in  an  even- wider  discrepancies  on  the  knowledge  and 
belief  about  the  underlying  game.  This  could  potentially  be  used  to  model  respective 
decision  makers  in  a  large  scale  systems,  for  example,  in  a  mean-held  games  setting. 
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6 

Results 


6.1  Overview 

This  chapter  presents  experimental  validation  of  the  nonlinear  robust  control  designs  pre¬ 
sented  in  Chapter  3,  Chapter  4,  and  Chapter  5.  We  base  our  results  on  three  platforms:  a 
three- link  planar  model  of  a  bipedal  humanoid  robot,  a  seven  degrees-of- freedom  robot  ma¬ 
nipulator,  and  a  15  degrees-of- freedom  model  based  on  the  Atlas  bipedal  humanoid  robot 
from  Boston  Dynamics.  These  models  are  chosen  for  their  varying  degrees  of  complexity 
and  we  test  the  proposed  nonlinear  robust  control  designs  on  these  models  using  different 
combinations  of  initial  conditions,  parameters,  and  forms  of  uncertainties.  In  these  results, 
we  hope  to  experimentally  demonstrate  the  significant  effects  of  various  forms  of  uncertain¬ 
ties  on  the  behaviors  of  the  closed-loop  system  and  hence  the  need  to  robustify  standard 
controllers.  We  also  discuss  the  potential  advantages  and  disadvantages  of  the  techniques 
proposed  in  the  previous  chapters. 
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Figure  6.1:  Planar  model  of  a  bipedal  humanoid  robot  with  three  links.  The  three  links  are 
referred  to  as  the  stance  leg  (Zi),  torso  (Z2),  and  swing  leg  (Z3),  respectively.  Of  the  three 
joints,  q\  is  stance  ankle  angle,  q2  is  the  angle  of  the  torso  with  respect  to  the  vertical, 
and  q3  is  the  angle  of  the  swing  leg  with  respect  to  the  stance  leg.  The  parameters  for  the 
model  are  given  in  Table  6.1. 

6.2  Three-link  planar  bipedal  humanoid  model 

6.2.1  Model 

We  model  a  three-link  bipedal  humanoid  robot  in  the  sagittal  plane.  A  diagram  for  the 
system  is  shown  in  Fig.  6.1,  with  the  physical  parameters  listed  in  Table  6.1.  Of  the  three 
joints,  q\  is  stance  ankle  angle,  q2  is  the  angle  of  the  torso  with  respect  to  the  vertical, 
and  q3  is  the  angle  of  the  swing  leg  with  respect  to  the  stance  leg.  The  three  links  are 
referred  to  as  the  stance  leg  (Zi),  torso  (Z2),  and  swing  leg  (Z3),  respectively,  and  we  follow  a 
standard  Lagrangian  approach  to  derive  the  rigid-body  dynamics  for  the  model.  We  define 
the  generalized  coordinates  of  the  system  q(t )  =  (qi(t),q2{t),q3(t))T  according  to  Fig.  6.1 
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Table  6.1:  Physical  parameters  for  a  planar  three-link  bipedal  model 


m  (kg) 

r  (m) 

4  (kg  ■  m 2) 

1  (m) 

Stance  leg  (l\) 

10.9 

0.37 

0.57 

0.77 

Torso  (l2) 

65 

0.4 

4.3 

- 

Swing  leg  (l3) 

12.6 

0.43 

0.85 

0.81 

and  the  Lagrangian 

i  i 

is  formed  by  taking  the  difference  between  the  sum  of  kinetic  energies  and  the  sum  of 
potential  energies.  The  Euler- Lagrange  equation  is 

d_  ( d£\  _  dC 

dt  \dqi)  dqt 

where  t,:  is  the  external  force  acting  on  the  ith  generalized  coordinate,  which  in  this  case,  is 
individual  actuated  degree  of  freedom  in  the  robot.  Solving  the  Euler-Lagrange  equation 
gives  the  nonlinear  equations  of  motion,  which  can  be  organized  in  a  form  that  is  linear  in 
acceleration  and  torque 

M (q)q  +  C(q,  q)q  +  G(q)  =  r,  (6.3) 

where  M  is  the  mass  matrix  that  is  square,  symmetric,  and  positive  definite;  C  consists 
of  centripetal  and  Coriolis  forces;  G  represents  the  gravitational  and  other  forces  acting 
on  the  robot;  and  r  is  a  vector  of  torques  that  are  the  controlled  inputs  to  the  system. 
In  addition,  we  note  that  (M  —  2(7)  is  a  skew-symmetric  matrix.  This  form  is  commonly 
referred  to  as  the  manipulator  equation  [135]. 

Rearranging  the  terms  and  inverting  the  mass  matrix,  we  can  rewrite  (6.3)  as  a  series 
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of  first-order  nonlinear  equations 


x  =  f{x)  +  g(x)u 
z  =  [h(x),u]T 

x(0)  =  x0,  (6.4) 

with  x  e  X  is  the  state  vector,  z  €  Z  is  the  output  vector,  u  G  U  is  the  vector  of  controlled 
inputs  to  the  system,  xq  is  a  vector  of  initial  states ,  and  the  matrices  f(x)  and  g(x) 

f(x)  =  —M~1(q)  ( C(q ,  q)q  +  G(q )) 

g(x)  =  M_1(g),  (6.5) 

are  the  system  matrices  that  are  smooth  vector  fields.  Following  the  uncertainty  and 
disturbance  modeling  described  in  Section  1.3.2,  we  augment  the  system  with  uncertain 
terms  A f(x,6,t)  and  A g2(x,0,t),  as  well  as  the  external  disturbances  wi,  w2,  and  w3  to 
the  three  joints.  We  use  w  to  denote  a  vector  of  external  disturbances  with  the  associated 
system  matrix  g\  (x) .  For  the  parametric  system  uncertainty,  we  use  the  term  6  to  represent 
an  offset  to  the  mass  of  the  torso  link,  with  the  nominal  value  being  65kg,  as  listed  in 
Table  6.1. 

The  final  description  of  the  problem  is 

x(t)  =  [f(x)  +  A f(x,  9,  f)]  +  g^wit)  +  [g2(x)  +  A g2(x,  9,  f)]  u(t) 
z(t)  =  [hi(yx),u(t)]T 

x(t0)  =  x0,  (6.6) 

where  f(x),  gi(x),  g2(x),h(x)  are  system  matrices  and  A f(x,6,t)  and  A g2(x,9,t)  are 
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parameterized  uncertain  terms.  We  use  the  same  optimization  objective 

J(x,  u.  w,  6)  =  min  max  - 
v  ’  ’  ’  J  neu  wew  2 

0e@ 

as  stated  in  the  previous  chapters. 

6.2.2  Standing  balance  control 

Due  to  the  nonlinear  nature  of  the  dynamics,  no  analytical  solutions  exist  for  this  model. 
Although  the  dimensionality  of  this  planar  three-link  model  is  relatively  low  compared 
to  other  problems  investigated  in  this  chapter,  the  computational  burdens  of  pursuing  a 
numerical  dynamic  programming-based  solution,  as  described  in  Section  3.2,  remains  high 
and  would  require  extensive  computational  machinery  for  a  parallelized  implementation. 
Instead,  we  implemented  a  value  function  approximation  based  control  design,  as  described 
in  Section  3.3.  This  is  done  to  obtain  insight  into  the  performance  of  the  proposed  ap¬ 
proximation  scheme  in  Section  3.4.3,  as  well  as  to  establish  a  benchmark  for  the  single  and 
multiple  model  trajectory  optimization. 

Given  the  approximate  value  function  V  (x)  as  a  weighted  sum  of  a  set  of  basis  functions 

n 

V(x)  =  '^jwi<f>i(x),  (6.8) 

2=1 

with  <f>i(x)  given  by  Fourier  basis  functions.  In  this  case,  the  system  of  integral  equations 
must  be  solved  numerically,  with  a  numerical  quadrature  embedded  within  a  trust  region 
dogleg  iteration  to  solve  the  system  of  equations.  To  approximate  the  value  function, 
we  used  the  Fourier-basis  functions  described  in  Section  3.3  in  an  uncoupled  setting.  As 
described  in  Section  3.3,  for  an  k- th  order  Fourier  approximation  in  n-dimensions,  there  are 
2 (h  +  l)n  basis  functions,  which  is  an  exponential  explosion  of  parameters  in  the  number 


rtf  r 


'to 


l*(*)ll2-72INt)ll! 


dt 


(6.7) 
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Time 

(a)  State  trajectories 


500 


■i  ooo - 1 1 1 1 1 - 1 

0  0.5  1  1.5  2  2.5  3 

Time 


(b)  Input  and  disturbance  trajectories 


Figure  6.2:  Control  design  based  on  value  function  approximation.  A  planar  three-link 
model  is  stabilized  under  bounded  external  disturbance  and  internal  parametric  uncertain¬ 
ties. 


of  dimensions.  Following  the  procedure  to  uncouple  the  basis  functions,  there  are  n{k  +  1) 
number  of  uncoupled  basis  functions.  With  k  —  35  and  n  =  6,  this  is  the  difference  between 
4,  353,  564,  672  coupled  and  216  uncoupled  basis  functions. 

We  model  the  scenario  of  a  three-link  robot  in  the  standing,  upright  position  with  initial 
velocities.  Fig.  6.2  gives  the  closed-loop  response  of  the  value  function  approximation-based 
control  design  given  a  set  of  initial  conditions  under  the  nominal  model.  Fig.  6.2(a)  shows 
the  stable  state  trajectories  along  with  the  center-of-mass  (COM)  position  and  velocity  in 
the  horizontal  direction.  We  see  that  due  to  the  external  disturbance,  the  COM  position 
initially  moves  in  the  positive  direction  before  stabilizing  back  to  the  upright  position. 
Fig.  6.2(b)  shows  the  corresponding  control  inputs  and  disturbance  inputs. 

The  global  necessarily  and  sufficient  conditions  for  optimality  derived  in  Section  3.4.2 
would  have  only  limited  utility  in  this  model  since  it  is  difficult  to  incorporate  state  and 
input  constraints  to  model  joint  and  actuator  limits.  Doing  so  would  require  formulating 
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additional  terms  in  the  Hamiltonian  to  incorporate  the  constraints,  which  significantly 
complicates  the  implementation  of  the  resulting  control  design.  While  other  numerical 
implementations  for  dynamic  programming  can  be  used  for  this  purpose  (see  for  example, 
Section  1.2.4),  we  only  include  an  unconstrained  version  of  the  optimization  problem  to 
establish  a  baseline  performance  for  subsequent  analysis.  Due  to  the  dimensionality  of  the 
problem,  we  also  must  make  a  simplifying  assumption  on  the  coupled  nature  of  the  value 
function.  The  trade-off  of  an  uncoupled  value  function  approximation  is  the  quality  of 
the  solution  with  computational  efficiency,  and  we  discuss  these  issues  more  in-depth  in 
Section  6.5. 

The  main  focus  of  this  section  is  to  apply  the  single- model  trajectory  optimization  from 
Chapter  4  and  multi- model  trajectory  optimization  in  Chapter  5  under  a  receding- horizon 
control  formulation.  Since  these  techniques  can  easily  incorporate  additional  constraints  on 
the  states  and  control  inputs,  we  also  simulate  the  effects  of  these  constraints  on  the  closed- 
loop  performance.  To  demonstrate  the  performance  of  these  robust  trajectory  optimizers, 
they  are  tested  against  a  LQR  controller  designed  with  a  linearized  model  of  the  dynamics, 
as  well  as  a  traditional  non-robust  trajectory  optimizer  that  does  not  explicitly  model 
system  uncertainties. 

For  the  RHC  setups,  we  model  a  robot  in  the  standing,  upright  position  with  initial 
velocities.  For  the  three  trajectory-based  approaches  (standard,  single  model  robust,  and 
multi- model  robust),  we  implement  a  receding-horizon  control  (RHC)  similar  to  the  ap¬ 
proach  described  in  Chapter  4  and  Chapter  5.  We  compute  trajectories  for  the  system, 
apply  the  obtained  control  inputs  for  a  fixed  duration,  then  compute  new  trajectories  using 
the  current  states  as  initial  conditions.  For  all  three  approaches,  the  total  simulation  time 
is  3s  with  a  horizon  of  0.1s,  with  30  collocation  points  for  each  computed  trajectory. 

Fig.  6.3  gives  a  snapshot  of  the  RHC  at  the  beginning  of  a  horizon,  which  takes  in  the 
specified  initial  conditions  and  computes  trajectories  based  on  the  nominal  model  accord- 
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Figure  6.3:  Snapshot  of  the  receding-horizon  control  for  a  planar,  bipedal  robot  with  three 
links  under  disturbance.  The  optimized  trajectories  are  generated  using  the  nominal  model 
at  the  beginning  of  the  horizon  with  the  specified  initial  conditions  modeling  the  robot  in  a 
standing,  upright  position  with  some  initial  velocities.  The  stable  trajectories  maintaining 
the  standing  balance  of  the  robot  are  applied  for  a  fixed  duration  of  the  RHC  horizon. 


ing  to  the  objective  function  in  (6.7).  These  results  are  extremely  similar  to  Fig.  6.2(a) 
using  the  value  function  approach  with  a  near  identical  form  of  external  disturbance.  The 
COM  position  initially  moves  in  the  positive  direction  before  stabilizing  back  to  the  up¬ 
right  position.  We  see  that  the  worst-case  internal  parametric  uncertainty  is  achieved  by 
increasing  the  parameter  0,  which  is  the  mass  of  the  torso  link,  as  shown  in  Fig.  6.3(b). 

Since  the  trajectory  optimizer  is  able  to  handle  constraints  on  the  control  inputs,  we 
repeat  the  standing  balance  experiment  but  constrain  the  three  control  inputs  Ui,  u2,  and 
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Figure  6.4:  Snapshot  of  the  receding-horizon  control  for  a  planar,  bipedal  robot  with 
three  links  under  disturbance.  The  trajectories  are  generated  with  a  single  model  with 
constrained  control  inputs.  The  stable  trajectories  maintaining  the  standing  balance  of 
the  robot  despite  limited  control  authority. 


u%  to  ±500.  The  resulting  trajectories  are  given  in  Fig.  6.4,  where  there  is  a  region  of 
control  saturation  in  r2  but  the  resulting  state  trajectories  remain  stable.  Fig.  6.4(a)  gives 
the  state  trajectories  and  COM  position  and  velocity  in  the  horizontal  direction.  Fig.  6.4(b) 
gives  the  corresponding  control  input  and  disturbance  trajectories. 

With  a  multi-model  design,  we  can  also  address  potential  unmodeled  dynamics.  Con¬ 
sider  the  example  raised  in  (5.8),  where  a  second-order  low  pass  filter  with  a  cutoff  fre¬ 
quency  of  10Hz  is  applied  to  the  control  input  u.  In  state  space  form,  this  low  pass  filter 
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(a)  State  trajectories  (b)  Input  and  disturbance  trajectories 


Figure  6.5:  Snapshot  of  the  receding-horizon  control  for  a  planar,  bipedal  robot  with  three 
links  under  disturbance.  The  trajectories  are  generated  with  two-model  design,  with  18 
concatenated  states.  The  stable  trajectories  maintaining  the  standing  balance  of  the  robot 
for  both  models. 

is  equivalent  to  a  linear  system 


if  =  Afxf  +  Bfu 

if  =  CfXf  +  Dfu ,  (6.9) 

where  the  input  u  is  low-pass  filtered  to  produce  the  new  input  Uf.  Given  three  control 
inputs  u\,  ii2 ,  and  u?, ,  we  designed  three  low-pass  filters  according  to  (6.29)  to  produce  the 
filtered  control  inputs.  These  filtered  inputs  are  used  in  the  dynamics  for  a  second  model 
to  augment  the  nominal  model.  In  total,  this  multi-model  design  contains  18  concatenated 
states:  six  for  the  nominal  model,  six  for  the  additional  model,  and  two  states  for  each 
filter.  As  shown  in  Fig.  6.5(a),  the  addition  of  the  unmodeled  dynamics  has  a  signihcant 
effect  on  the  response  of  the  system,  as  the  states  X2  and  X3  have  widely  different  joint 
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Table  6.2:  Model  uncertainties  for  a  planar  three- link  bipedal  humanoid 


SI 

S2 

S3 

S4 

External  Disturbance 

None 

Sinusoidal 

Open-loop 

Open-loop 

Parametric  Uncertainty 

None 

None 

Sinusoidal 

Sinusoidal 

Unmodeled  Dynamics 

None 

None 

None 

Low-pass  filter 

trajectories  between  the  nominal  and  the  second  model  (with  x8  and  xg).  Despite  these 
differences,  the  control  input  stabilized  all  trajectories  in  the  two  models. 

We  now  compare  the  single  and  multi- model  robust  trajectory  optimization  against 
an  LQR  controller  designed  with  a  linearized  model  of  the  dynamics,  as  well  as  a  tradi¬ 
tional  non-robust  trajectory  optimizer  that  does  not  explicitly  model  system  uncertainties. 
Four  scenarios  S1-S4  were  set  up  using  various  combinations  of  initial  conditions,  exter¬ 
nal  disturbances,  parametric  uncertainties,  and  unmodeled  dynamics.  Table  6.2  lists  the 
conditions  for  the  four  scenarios. 

In  SI,  the  four  approaches  were  tested  with  no  external  disturbances,  no  parametric  un¬ 
certainties,  and  no  unmodeled  dynamics.  The  non-zero  initial  condition  for  the  simulation 
was  set  up  outside  the  linear  domain  of  attraction  of  the  LQR  control.  As  expected,  the 
LQR  control  failed  to  stabilize  the  system,  while  the  three  RHC-based  approaches  success¬ 
fully  stabilized  the  system,  with  similar  closed-loop  responses.  The  closed-loop  response  of 
the  four  approaches  are  shown  in  Fig.  6.6(a). 

In  S2,  the  four  approaches  were  tested  with  sinusoidal  external  disturbances,  no  para¬ 
metric  uncertainties,  and  no  unmodeled  dynamics.  The  three  links  start  from  an  upright 
position  with  zero  initial  conditions,  but  the  external  disturbances  perturbed  the  system, 
causing  LQR  control  to  fail.  The  three  RHC-based  approaches  continued  to  be  able  to 
stabilize  the  system  given  the  external  disturbances.  The  closed-loop  response  of  the  four 
approaches  are  shown  in  Fig.  6.6(b). 
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(a)  SI  (b)  S2 


(c)  S3 


(d)  S4 


Figure  6.6:  Comparison  of  LQR,  standard  RHC,  robust  single-model  RHC,  and  robust 
multi-model  RHC.  Four  scenarios  S1-S4  were  setup  using  various  combinations  of  initial 
conditions,  external  disturbances,  parametric  uncertainties,  and  unmodeled  dynamics  listed 
in  Table  6.2 
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In  S3,  the  four  approaches  were  tested  with  an  open-loop  external  disturbance  com¬ 
puted  using  the  single-model  robust  trajectory  optimization.  In  addition,  we  augment  the 
system  with  a  sinusoidal  parametric  uncertainty  on  the  torso  mass  m2,  but  no  unmodeled 
dynamics.  Given  that  this  setup  matches  closely  with  the  assumptions  made  by  the  single¬ 
model  RHC,  we  expect  that  it  will  have  the  best  performance.  Indeed,  the  standard  RHC 
failed  to  stabilize  the  system  and  the  states  of  the  system  violated  the  constraints  of  the 
solver  around  t  =  Is.  The  single  and  multi-model  RHC  both  were  able  to  stabilize  the 
system,  with  similar  performances.  The  closed-loop  response  of  the  four  approaches  are 
shown  in  Fig.  6.6(c). 

In  S4,  the  four  approaches  were  tested  with  an  open-loop  external  disturbance  computed 
using  the  single-model  robust  trajectory  optimization,  sinusoidal  parametric  uncertainty, 
and  unmodeled  dynamics  in  the  form  of  a  low-pass  filter  on  the  control  inputs.  In  this  case, 
only  the  multi-model  RHC  was  able  to  stabilize  the  system.  The  closed-loop  response  of 
the  four  approaches  are  shown  in  Fig.  6.6(d). 

One  important  design  parameter  in  addressing  model  uncertainty  is  the  parameter  7 
in  the  optimization  objective  (6.7).  As  discussed  in  Section  3.2,  7  bounds  the  disturbance 
inputs  w  and  can  be  adjusted  to  vary  the  amount  of  disturbance  given  to  the  robot.  To 
visualize  the  effect  of  7,  we  generated  center-of-mass  (COM)  position  and  velocity  in  the 
horizontal  direction  with  respect  to  time,  and  plotted  the  phase  diagram  in  Fig.  6.7  using 
several  choices  of  7.  The  trajectories  show  that  an  initial  non-zero  COM  location  can  be 
stabilized  back  to  the  upright  position,  given  disturbances  of  various  magnitudes. 
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Figure  6.7:  Phase  diagram  generated  using  center-of-mass  (COM)  position  and  velocity  in 
the  horizontal  direction  with  respect  to  time.  The  trajectories  show  that  an  initial  non-zero 
COM  location  can  be  stabilized  back  to  the  upright  position,  given  disturbances  of  various 
magnitudes  specified  using  7.  No  parametric  uncertainties  are  specified. 

6.3  KUKA  KR  5  sixx  R650  manipulator  control 

6.3.1  Operational  space  control 

In  this  section,  we  extend  the  proposed  robust  control  designs  to  implement  the  operational 
space  control  of  a  robotic  manipulator.  As  with  Section  6.2,  we  assume  the  dynamics  to 
take  the  form 

M(q)q  +  C(q,q)q  +  G(q)  =  t,  (6.10) 

where  M  is  the  mass  matrix  that  is  square,  symmetric,  and  positive  definite;  C  consists  of 
centripetal  and  Coriolis  forces;  G  represents  the  gravitational  and  other  forces  acting  on 
the  robot;  and  r  is  a  vector  of  torques  that  are  the  controlled  inputs  to  the  system.  To  be 
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consistent  with  the  notations  used  in  previous  work,  we  also  define 

F(q,q)  =  ~C(q,q)  -  G(q)  (6.11) 

and  hence  the  manipulator  equation  can  be  written  in  the  form 

M(q)q  =  u  +  F(q,q).  (6.12) 

We  assume  that  the  task  description  can  be  written  in  the  form  of  a  constraint  given  by 
the  function 

h(q,  q, t)  =  0.  (6.13) 

Particularly,  for  a  class  of  tasks  that  can  be  formulated  as 

A  (q,q,t)q  =  b(q,q,t),  (6.14) 

which  can  be  achieved  for  most  tasks  by  differentiating  h(q,  q,  t )  with  respect  to  time.  This 
class  of  problems  has  an  elegant  optimization-based  solution,  given  by  Peters  [1].  Given 
an  objective  function 

J  =  uTN(t)u,  (6.15) 

subject  to  the  nonlinear  dynamics 

M(q)q  =  u  +  F(q,q)  (6.16) 

and  the  task  constraint 

^  (q,Q,t)q  =  b(q,q,t).  (6.17) 
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The  solution  is  given  by  the  control  law 

u  =  AH  ^AM-'N-^y  ( b  -  AM-'F),  (6.18) 

where  D+  denotes  the  pseudo  inverse  for  a  general  matrix  D  and  denotes  the  the 
symmetric,  positive  definite  matrix  for  which  =  D. 

For  the  operational  space  control  of  this  system,  the  task  description  is  given  by  the 
end-effector  trajectory,  where 

h(q,  t )  =  f(q(t ))  -  xd{t)  =  x(t)  -  xd(t)  =  0,  (6.19) 

with  f(q(t ))  being  the  forward  kinematics  of  the  system  that  maps  the  joint  positions  q{t) 
to  the  task  space.  Given  the  desired  trajectory  xd(t),  we  can  use  an  attractor  in  the  task 
space  in  the  form 

(x  -  xd)  +  Kd(x  -  xd )  +  Kp(x  -  xd)  =  0,  (6.20) 

where  KD  and  KP  are  positive-definite  damping  and  proportional  gains.  Using  the  Jaco¬ 
bian  relations, 

x  =  J{q)q  (6.21) 

and 

x  =  J(q)q  + J(q)q,  (6.22) 

the  task  space  constraint  becomes 

xd  +  Kd(x  -  xd)  +  Kp(x  -  xd)  =  J(q)q  +  j{q)q,  (6.23) 

which  can  be  put  in  the  form 

=  (6.24) 
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with 

A(q,  q,t)  =  J  (6.25) 

and 

b(q,q,t)  =  xd  +  Kd(x  -  xd)  +  Kp(x  -  xd)  -  J(q)q.  (6.26) 

As  discussed  in  Peters  [1],  the  choice  of  the  parameter  N(t)  in  the  objective  function  is 
important  in  influencing  the  behaviors  of  the  resulting  optimization.  For  choices  N  =  M-1, 
N  =  M~2,  and  N  =  I,  different  versions  of  the  control  law  (6.18)  can  be  produced 
with  drastically  different  behaviors.  While  the  feedback  control  law  is  optimal  with  the 
assumption  that  the  given  dynamic  model  is  perfect,  actual  performance  of  the  control 
law  differs  depending  on  the  accuracy  of  the  given  model.  As  discussed  in  another  paper 
by  Peters  [136],  small  unmodeled  nonlinearities  can  have  a  drastic  effect.  If  the  estimated 
mass  matrix  differs  by  a  small  amount,  the  control  law  will  result  in  unstable  null-space 
behaviors.  We  refers  the  readers  to  Peters  [136]  for  a  detailed  discussion  on  robustness 
issues  for  this  particular  model. 

In  this  work,  we  are  interested  in  extending  the  robust  trajectory  optimization-based 
approaches  proposed  in  Chapter  4  and  Chapter  5  to  compensate  for  potential  modeling 
errors  in  the  system  matrices. 

6.3.2  Results 

Our  experimental  platform  is  the  KUKA  KR  5  sixx  R650  (Fig.  6.8),  a  popular  industrial 
robot.  It  is  a  small-size  6R  manipulator  with  a  spherical  wrist  with  a  total  weight  of  127kg, 
5kg  of  payload,  and  maximum  stretch  of  0.65m  from  the  base.  For  all  experiments,  we 
implement  the  operational  space  control  for  this  robot’s  end  effector  to  track  a  circular 
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(a)  KUKA  KR  5  sixx  R650  indus-  (b)  Kinematic  model  of  the  KUKA  R650  robot 

trial  robot 


Figure  6.8:  KUKA  KR  5  sixx  R650  industrial  robot. 


motion.  The  X,  Y,  Z  components  of  the  end  effector  are  given  by  sinusoidal  signals 


X  =  0.6sin(t) 

Y  =  0.15  sin(t) 

Z  =  0.6cos(t).  (6.27) 


As  discussed  previously,  the  dynamics  for  this  robot  takes  the  form 


M(q)q  +  C(q ,  q)q  +  G(q)  —  r. 


(6.28) 


In  this  operational  space  control  scenario,  we  model  uncertainty  in  the  mass  matrix  of  the 
system  in  the  form  of  a  time- varying  parameter  variation,  as  discussed  in  [136]. 
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Table  6.3:  KUKA  KR  5  sixx  R650  joint  limits 


Axis 

Range 

Speed 

Axis  1  (Al) 

±170° 

375° /s 

Axis  2  (A2) 

+45°/-190° 

300° /s 

Axis  3  (A3) 

+165°/-119° 

375° /s 

Axis  4  (A4) 

±190° 

O 

o 

m 

Axis  5  (A5) 

±120° 

410°/s 

Axis  6  (A6) 

±358° 

660° /s 

We  implemented  two  trajectory-based  approaches  using  the  single  model  and  multi¬ 
model  robust  trajectory  optimizers  proposed  in  Chapter  4  and  Chapter  5.  We  generate  the 
optimal  trajectories  for  this  operational  space  control  task,  incorporating  the  full  dynamics 
of  the  robot  as  well  as  the  limits  on  the  joint  positions  and  velocities.  These  figures  are  given 
in  Table  6.3.  For  the  multiple-model  trajectory  optimizer,  we  implement  one  additional 
model,  incorporating  a  second-order  low  pass  filter  on  the  control  inputs,  in  addition  to 
the  nominal  model.  The  filter  dynamics  take  the  form 


if  =  AfXf  ±  BfU 

Uf  =  CfXf  +  DfU ,  (6.29) 

as  discussed  in  Section  6.2.  Since  the  robot  has  six  degrees  of  freedom,  each  model  contains 
12  states,  with  two  additional  states  for  the  filter  on  each  control  input.  In  total,  there  are 
36  states  for  the  multi-model  trajectory  optimizer. 

Both  trajectory  optimization  methods  are  implemented  a  receding-horizon  control  frame¬ 
work  similar  to  the  setup  described  in  Chapter  4  and  Chapter  5.  We  compute  trajectories 
for  the  system,  apply  the  obtained  control  inputs  for  a  fixed  duration,  then  compute  new 
trajectories  using  the  current  states  as  initial  conditions.  For  all  both  approaches,  the  total 
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Figure  6.9:  A  snapshot  of  the  optimized  joint  trajectories  for  the  KUKA  KR  5  sixx  R650 
circular  task.  Joint  positions  (blue)  and  velocities  (red). 

simulation  time  is  15s  with  a  horizon  of  0.1s,  with  30  collocation  points  for  each  computed 
trajectory.  Fig.  6.9  shows  a  snapshot  of  the  optimized  joint  trajectories  generated  at  the 
beginning  of  a  horizon  for  one  full  revolution  in  the  circular  trajectory. 

To  compare  the  performance  of  these  methods,  we  also  implemented  the  optimal 
control-based  operational  space  framework  proposed  in  Peters  [1].  For  the  objective  func¬ 
tion,  we  used 

N  =  M~2,  (6.30) 

in  the  formulation  in  (6.15),  which  is  a  version  of  the  well-known  control  law  proposed  in 
Khatib  [137]  with  an  additional  null-space  term,  with  the  interpretation  consistent  with 
principle  of  virtual  work  of  d’Alembert.  We  refer  the  readers  to  [1]  for  a  detailed  derivation 
of  the  control  law  and  its  interpretation. 

To  illustrate  the  effects  of  different  types  of  uncertainties,  three  scenarios  S1-S3  were 
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Table  6.4:  Model  uncertainties  for  the  KUKA  KR  5  sixx  R650 


SI 

S2 

S3 

Parametric  Uncertainty 

Unmodeled  Dynamics 

None 

None 

Sinusoidal 

None 

Sinusoidal 

Time  delay 

set  up  using  various  combinations  of  initial  conditions,  parametric  uncertainties,  and  un¬ 
modeled  dynamics.  Table  6.4  lists  the  conditions  for  the  three  scenarios.  In  SI,  the  three 
approaches  were  tested  with  no  parametric  uncertainties  and  no  unmodeled  dynamics.  As 
expected,  all  three  approaches  successfully  tracked  the  circular  reference  trajectories  for  the 
end  effector,  with  comparable  tracking  performances.  The  multi-model  approach  deviated 
from  the  others  as  it  has  the  most  conservative  assumptions  in  terms  of  uncertainty  and 
performed  the  worst  when  no  uncertainties  were  present  in  the  system.  The  closed-loop 
response  of  the  three  approaches  are  shown  in  Fig.  6.10(a). 

In  S2,  the  three  approaches  were  tested  with  a  sinusoidal  parametric  uncertainty  in 
the  mass  matrix  of  the  manipulator  (as  described  in  Peters  [136]),  with  no  additional 
unmodeled  dynamics.  Given  that  this  setup  matches  closely  with  the  assumptions  made 
by  the  single-model  RHC,  we  expect  that  it  will  have  the  best  performance.  Indeed,  optimal 
control  law  in  exhibited  a  large  tracking  error  while  the  single  and  multi-model  achieved 
better  tracking  accuracy,  with  similar  performances.  The  closed- loop  response  of  the  three 
approaches  are  shown  in  Fig.  6.10(b). 

In  S3,  the  three  approaches  were  tested  with  the  sinusoidal  parametric  uncertainty  and 
a  time-delay  for  the  control  inputs.  Due  to  the  significant  parametric  variation  and  the  un¬ 
modeled  dynamics,  the  optimal  control  law  without  model  uncertainties  exhibits  significant 
errors  in  tracking  the  reference  motion.  The  results  show  that  the  multi-model  trajectory 
optimizer  performed  better  compared  to  the  single-model  approach  since  it  contained  more 
expressive  models  of  the  uncertainties.  The  closed-loop  response  of  the  three  approaches 
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- Desired 

- Peters 


(a)  SI 

- Desired  - Desired 

- Peters  - Peters 


(b)  S2  (c)  S3 


Figure  6.10:  Comparison  of  the  operational  space  control  law  of  Peters  [1],  robust  single¬ 
model  RHC,  and  robust  multi-model  RHC.  Four  scenarios  S1-S3  were  setup  using  various 
combinations  of  initial  conditions,  external  disturbances,  parametric  uncertainties,  and 
unmodeled  dynamics  listed  in  Table  6.4 

are  shown  in  Fig.  6.10(c). 

For  both  parametric  uncertainties  and  unmodeled  dynamics,  we  also  conducted  exten¬ 
sive  evaluations  on  the  performance  of  the  proposed  approaches  in  response  to  parametric 
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Figure  6.11:  Sum  of  squared  errors  for  the  operational  space  control. 


values  within  a  specific  structural  class.  Fig.  6.11  shows  the  sum  of  squared  tracking  er¬ 
rors  for  the  three  approaches  with  respect  to  the  desired  end-effector  trajectory  (circle). 
The  average  errors  are  obtained  over  batches  of  20  simulations,  and  are  plotted  against 
parameter  variations  in  a  specific  type  of  structural  uncertainty,  described  below. 
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Fig.  6.11(a)  shows  the  tracking  errors  with  respect  to  the  amount  of  delay  in  the  system, 
varying  from  0  to  20  ms.  In  the  nominal  model  with  no  delay,  the  optimal  control  law 
outperformed  both  the  single  and  two  model  robust  RHCs.  As  the  amount  of  delay  is 
increased,  the  optimal  control  design  based  on  the  nominal  model  exhibited  increasing 
tracking  errors.  The  single  and  two  model  RHCs  both  performance  well  in  the  presence  of 
delays,  with  a  slight  performance  edge  for  the  two  model  RHC  for  larger  delays. 

Fig.  6.11(b)  shows  the  tracking  errors  with  respect  to  a  sinusoidal  parametric  uncer¬ 
tainty  in  the  mass  matrix  of  the  manipulator.  As  the  magnitude  of  the  uncertainty  in¬ 
creases,  we  expect  robust  RHCs  to  outperform  the  optimal  design  based  on  the  nominal 
model.  Indeed,  the  single  model  RHC  performed  the  best  as  the  designed  parametric  vari¬ 
ation  matched  the  true  uncertainty  in  the  system,  while  the  two  model  RHC  resulted  in 
average  performance  in  the  spectrum,  but  best  when  the  amount  of  uncertainty  was  large. 

Fig.  6.11(c)  and  Fig.  6.11(d)  show  the  results  of  another  set  of  simulations  where  there  is 
a  gravity  mismatch  between  the  control  design  and  reality.  In  Fig.  6.11(c)  ,  the  true  gravity 
was  held  at  9.8  m/s 2  while  the  assumed  gravity  in  the  control  design  was  varied  from  1-3  Gs. 
This  implies  that  in  the  gravity  compensation  component  of  the  controller,  more  torques 
were  supplied  than  the  necessary  amount  to  achieve  gravity  compensation.  Fig.  6.11(d) 
simulated  the  opposite  scenario,  where  the  assumed  gravity  in  the  control  design  was  held 
constant  at  9.8  m/s 2  while  the  true  gravity  varied  from  1-3  Gs.  This  implies  that  the 
amount  of  control  authority  in  the  control  design  was  insufficient  in  achieving  effective 
gravity  compensating.  Overall,  the  results  are  as  expected  -  the  optimal  design  based 
on  the  nominal  model  of  gravity  was  sensitive  to  variations  in  gravity  and  the  tracking 
errors  correlated  positively  as  the  mismatch  in  gravity  was  increased.  The  single  and  two 
model  RHCs  performed  about  the  same  in  both  scenarios,  effectively  compensating  for  the 
variations  in  gravity. 
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Figure  6.12:  Taken  from  Boston  Dynamics  Atlas  Robot  Operation  and  Maintenance  Man¬ 
ual,  ATLAS-01-0018- v2.0,  courtesy  of  Boston  Dynamics. 


6.4  Atlas  humanoid  push  recovery 

To  test  the  limits  of  the  proposed  robust  trajectory  optimizers  in  terms  of  dimensionality, 
we  chose  a  reduced-order  full-body  model  for  the  Atlas  bipedal  humanoid  robot  from 
Boston  Dynamics,  currently  used  in  the  DARPA  Robotics  Challenge.  The  Atlas  robot  is 
a  hydraulic  humanoid  with  near-human  anthropometry.  The  body  weighs  approximately 
160  kg  and  contains  28  hydraulically-actuated  degrees  of  freedom,  including  two  arms,  legs, 
feet  and  a  torso.  The  Atlas  robot  and  its  kinematic  configuration  is  shown  in  Fig.  6.12. 
Table  6.5  contains  the  ranges  for  joint  motions  for  each  limb,  in  the  X,  Y,  Z  directions. 

In  the  existing  literature,  studies  such  as  Stephens  [2]  have  shown  push  recovery  strate¬ 
gies  such  as  ankle,  hip,  and  stepping,  subject  to  the  magnitude  of  the  external  disturbance. 
For  small  pushes,  the  robot  can  apply  corrective  torques  at  the  ankle  joint  to  compensate 
for  the  translation  of  the  CoM.  For  larger  disturbances,  a  torso  rotation  in  conjunction  with 
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Table  6.5:  Atlas  humanoid  robot  joint  limits 


DoF 

X(°) 

Y  (°) 

z  0 

Min 

Max 

Range 

Min 

Max 

Range 

Min 

Max 

Range 

Neck 

1 

- 

- 

- 

-34.5 

+65.5 

100 

- 

- 

- 

Back 

3 

-30 

+30 

60 

-12.6 

+30.9 

43.5 

-38 

+38 

76 

Shoulder 

3 

-90 

+90 

180 

- 

- 

- 

-45 

+90 

135 

Elbow 

1 

-135 

0 

135 

0 

+180 

180 

- 

- 

- 

Wrist 

2 

-67.5 

67.5 

135 

0 

+180 

180 

- 

- 

- 

Hip 

3 

-30 

+30 

60 

-92.4 

+37.7 

130.1 

-45 

+10 

55 

Knee 

1 

- 

- 

- 

0 

I  135 

135 

- 

- 

- 

Ankle 

2 

-25 

+25 

50 

-52 

+28 

80 

- 

- 

- 

ankle  torques  can  significantly  improve  the  performance  of  the  push  recovery  controller. 
For  significant  disturbances  that  are  not  recoverable  using  the  ankle  and/or  hip  strategies, 
a  stepping  motion  leads  to  an  instantaneous  shift  in  the  CoM  position  and  can  stabilize 
the  robot.  These  strategies  are  illustrated  in  Fig.  6.13. 

For  this  reduced-order  Atlas  model,  we  are  interested  in  obtaining  human-like  emergent 
behavior  from  an  optimization  process.  Instead  of  manually  defining  recovery  strategies 
under  the  ankle/hip/stepping  formulation,  we  formulate  an  optimization  problem  that  ex¬ 
plicitly  models  a  bounded  external  disturbance  and  other  internal  parametric  uncertainties. 
We  show  that  under  this  optimization-based  approach,  we  can  generate  recovery  behaviors 
automatically,  depending  on  the  assumed  upper  bound  of  the  disturbance. 

The  mathematical  modeling  of  the  dynamics  is  similar  to  the  three-link  and  six-link 
models  described  in  the  previous  sections.  We  also  model  contacts  on  the  robot,  which 
requires  additional  constraints  on  the  system.  We  follow  the  approach  used  in  Posa  [138]. 
The  joint  space  of  the  robot  is  defined  in  generalized  coordinates,  which  in  this  case,  is 
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1.  2.  3. 

Figure  6.13:  Ankle,  hip,  and  stepping  strategies  for  push  recovery.  Illustration  courtesy  of 
Ben  Stephens  [2], 

individual  actuated  degree  of  freedom  in  the  robot.  The  nonlinear  equations  of  motion  is 
similar  to  (6.3),  in  the  form 

M {q)q  +  C(q,  q)q  +  G(q)  =  B(c{)t  +  JT A,  (6.31) 

where  M  is  the  mass  matrix  that  is  square,  symmetric,  and  positive  definite;  C  consists  of 
centripetal  and  Coriolis  forces;  G  represents  the  gravitational  and  other  forces  acting  on 
the  robot;  B  is  the  input  mapping  for  r,  a  vector  of  torques  that  are  the  controlled  inputs 
to  the  system;  A  is  a  vector  of  constraint  forces  acting  along  the  surface  normal,  which  is 
projected  by  the  Jacobian  matrix  J  into  the  joint  space  as 

dq  ’ 
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As  with  (6.3),  this  equation  can  be  placed  in  a  form  consistent  with 

x{t)  =  [f(x)  +  A f(x,  9,  f)]  +  g^wit)  +  [g2(x)  +  A g2(x,  9,  f)]  u(t) 
z(t)  =  [hi(x),u(t)]T 

x(to)  =  x0,  (6.33) 


subject  to  the  non-penetration  constraint 


4>{<l)  >  0, 


(6.34) 


where  the  equality 


Mq)  =  0 


hold  iff  the  i-th  constraint  is  active.  The  complementarity  constraint 


(6.35) 


A  >  0 

4 bT(q)X  =  0  (6.36) 

ensures  that  the  contact  forces  can  be  non-zero  iff  the  bodies  are  in  contact. 

For  the  purposes  of  dimensionality  reduction  in  generating  locomotive  behaviors,  we 
lock  down  the  upper  body  of  the  robot  and  treat  the  neck,  shoulder,  elbow  and  wrist 
joints  of  the  robot  as  fixed.  The  remaining  15  degrees  of  freedom  consists  of  the  back  (3), 
hip  (6),  knee  (2),  and  ankle  (4).  The  state  vector  of  the  system,  which  consists  of  q  and 
q,  contains  30  states.  The  actual  equations  of  motion  are  generated  using  the  Dynamic 
Animation  and  Robotics  Toolkit  (DART),  which  computes  Lagrange’s  equations  derived 
from  D’Alembert’s  principle,  with  full  access  to  internal  kinematic  and  dynamic  quantities, 
such  as  the  mass  matrix,  Coriolis  and  centrifugal  forces. 

To  solve  this  30-dimensional  optimization  problem,  we  first  follow  the  procedure  de- 
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tailed  in  Posa  [138]  and  Anitescu  [139],  which  relaxes  constraints  of  the  form 

G(x)  >  0 
H(x)  >0 

Gt(x)H(x)  A  =  0  (6.37) 

into  the  form 

G(x)  >  0 
H(x)  >  0 

Gi(x)Hi(x) X  <  0,  (6.38) 

where  a  point-wise  evaluation  is  used.  In  addition,  to  improve  numerical  stability,  we 
introduce  additional  slack  variables  a  and  f3  in  the  form 

a,  ft  >  0 
a  =  G(x) 

P  =  H(x) 

aifii  <  0.  (6.39) 

Overall,  this  optimization  framework,  as  applied  to  the  reduced-order  model  of  Atlas, 
results  in  the  generation  of  different  push  recovery  strategies  according  to  the  assumed 
bounds  of  the  external  disturbances  and  other  internal  parametric  variations.  Fig.  6.14 
shows  several  examples  of  generated  strategies.  Unlike  manually  defined  push  recovery 
control,  these  motions  are  generated  using  the  whole-body  model  of  the  robot  without 
explicit  reference  to  any  particular  joint.  As  a  result,  multiple  joints  are  often  used  simul¬ 
taneously  to  achieve  an  overall  behavior,  and  we  use  the  dominant  joint  motions  to  broadly 
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(a)  Ankle  strategy  (b)  Hip  strategy  (c)  Stepping  strategy 

Figure  6.14:  Snapshot  of  optimized  behaviors  for  push  recovery.  The  ankle  strategy  com¬ 
pensates  for  small  external  disturbances  by  using  ankle  torques.  The  hip  strategy  rotates 
the  torso  in  the  direction  of  the  push  to  recover  from  medium  disturbances.  For  large 
disturbances,  stepping  forwards  shifts  the  CoM  location  forward  into  the  support  polygon. 

interpreted  behaviors  as  ankle,  hip,  and  stepping  strategies  in  the  description  below. 

Fig.  6.15(a)  shows  the  the  center  of  mass  trajectories  in  the  forward  direction  in  each 
of  the  three  generated  strategies,  corresponding  to  the  screenshots  in  Fig.  6.14.  For  small 
external  disturbances,  the  "ankle"  strategy  accommodates  forward  CoM  translations  of 
around  0.05m,  using  mainly  torques  applied  at  the  ankle.  For  moderate  external  distur¬ 
bances  which  results  from  a  forward  CoM  motion  of  0.08m,  the  "hip"  strategy  is  more 
effective  in  stabilizing  the  body  back  into  equilibrium.  For  large  disturbances,  the  robot 
achieves  stabilization  by  stepping,  and  instead  of  regulating  the  CoM  back  to  the  original, 
the  CoM  moves  forward  to  the  0.14m  position.  Fig.  6.15(b)  shows  the  corresponding  torso 
motions  for  each  of  the  strategies.  In  both  the  ankle  and  stepping  strategies,  there  is  little 
to  no  motion  for  the  torso  in  the  forward  direction.  In  the  hip  strategy,  the  robot  actively 
rotates  the  torso  forward  by  about  0.4  rad,  then  regulates  the  torso  position  back  to  the 
equilibrium  states. 
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Figure  6.15:  Center  of  mass  and  torso  trajectories  for  the  ankle,  hip,  and  stepping  strate¬ 
gies.  In  both  the  ankle  and  stepping  strategies,  there  is  little  to  no  motion  for  the  torso  in 
the  forward  direction.  In  the  hip  strategy,  the  robot  actively  rotates  the  torso  forward  by 
about  0.4  rad,  then  regulates  the  torso  position  back  to  the  equilibrium  states. 


6.5  Discussion 


In  all  of  the  example  problems  demonstrated  in  this  chapter,  trajectory  optimization  within 
a  receding  horizon  control  framework  was  shown  to  be  an  effective  approach  in  robustifying 
controller  design  against  external  disturbances,  parametric  uncertainties,  and  unmodeled 
dynamics.  In  particular,  the  use  of  multiple  models  is  an  effective  and  flexible  way  of  rep¬ 
resenting  uncertainties,  where  each  model  can  have  entirely  different  structural  properties. 
For  example,  In  the  multi-model  RHC  case  in  Fig.  6.6(d),  the  nominal  model  was  aug¬ 
mented  by  a  second,  higher  dimensional  model,  which  accounted  for  high-order  dynamics. 
The  concatenation  of  a  different  state  space  would  not  have  been  possible  with  a  single 
model. 

Since  the  proposed  approaches  are  based  on  approximate  methods  using  the  spectral 
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approximation  framework  presented  in  Chapter  2,  the  solutions  obtained  only  approximates 
the  true  globally  optimal  solutions  that  can  be  obtained  by  solving  the  robust  control 
problem  analytically.  Hence,  it  is  important  to  characterize  the  quality  of  the  solutions 
obtained.  The  quality  of  the  solution  is  controlled  by  two  factors:  the  initial  guesses 
supplied  to  the  solver,  and  the  number  of  collocation  points  used  in  approximating  the 
trajectories. 

The  single  and  multi- model  RHC  are  both  based  on  direct  trajectory  optimization 
presented  in  Section  4.3,  which  place  a  softer  requirement  on  the  quality  of  the  initial 
guesses  as  compared  to  the  indirect  methods  presented  in  Section  4.2.  For  low-dimensional 
problem  such  as  Section  6.2  and  Section  6.3,  we  supplied  random  initial  guesses  to  the  solver 
to  produce  the  trajectories.  Due  to  the  hybrid  nature  of  the  push  recovery  experiments 
in  Section  6.4,  the  requirements  for  initial  guesses,  especially  in  the  stepping  strategy,  was 
more  important.  In  general,  for  larger  problems,  the  quality  of  the  initial  guess  will  be 
important,  and  as  discussed  in  [140],  the  obtained  trajectories  tend  to  be  homotopic  to 
the  initial  trajectories  and  it  is  difficult  to  recover  from  an  initial  guess  in  a  bad  homotopy 
class. 

The  second  factor  in  the  solution  quality  is  the  number  of  collocation  points  used  in 
approximating  the  trajectories.  In  Fig.  6.16,  we  examined  the  metric  for  solution  quality, 
the  objective  function  value,  with  respect  to  the  number  of  collocation  points  used.  For 
the  problem  setups  in  Section  6.2,  with  a  state-space  dimension  of  up  to  18,  the  optimal 
number  of  collocation  points  is  20,  beyond  which  no  improvements  to  the  objective  function 
value  can  be  obtained.  For  larger  scale  problems,  especially  those  that  are  highly  nonlinear, 
we  expect  that  a  higher  number  of  collocation  points  will  be  required,  though  it  is  difficult 
to  provide  a  quantitative  analysis  as  the  results  are  highly  problem-specific. 

The  number  of  collocation  points  is  an  important  factor  in  the  computational  efficiency 
of  the  trajectory  optimizers.  Computational  efficiency  represents  a  trade-off  with  solution 
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Figure  6.16:  Computation  time  (seconds)  and  objective  function  value  vs.  the  number  of 
collocation  points. 

quality,  since  as  the  number  of  collocation  points  is  increased,  computational  cost  also 
increases.  In  Fig.  6.16,  we  examined  the  computation  time  with  respect  to  the  number  of 
collocation  points  used  and  obtained  a  super-linear  relationship.  The  increase  from  3  to 
30  collocation  points  led  to  a  corresponding  linear  increase  in  the  computation  time,  from 
5  to  33  seconds.  We  expect  this  relationship  to  hold  for  larger-scale  problems. 

For  low-dimensional  problem  presented  in  Section  6.2  and  Section  6.3,  no  initial  guesses 
were  required  to  generate  the  solutions  in  any  of  the  batch  simulations  conducted.  This  is  a 
clear  advantage  of  a  direct  trajectory  optimization  setup,  as  opposed  to  the  indirect  variety 
using  first-order  necessary  conditions  (Section  4.2).  In  Section  6.4,  the  30-dimensional 
optimization  problem  required  initial  guesses  in  order  the  generate  the  three  strategies  in 
Fig.  6.13.  This  is  likely  due  to  the  complexities  of  the  underlying  dynamics,  in  terms  of 
both  nonlinearity  and  dimensionality.  For  large  systems  such  as  this,  we  expect  that  good 
initial  guesses  will  be  an  important  component,  even  for  direct  trajectory  optimization. 
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There  is  a  clear  need  for  both  effective  initial  guesses  and  reducing  the  computational 
burdens  associated  with  a  large  number  of  collocation  points.  An  effective  strategy  is  to 
start  out  the  optimization  with  random  initial  guesses,  with  a  low  number  of  collocation 
points,  then  using  the  optimization  result  to  jump  start  a  second  optimization  iteration 
with  a  high  number  of  collocation  points.  In  high  dimensional  examples  such  as  those 
in  Section  6.4,  we  found  this  to  be  a  particularly  effective  way  of  producing  high  quality 
solutions  while  reducing  computation  time. 

An  important  drawback  to  the  proposed  trajectory  optimization-based  RHC  is  the 
computational  cost.  All  experiments  in  Section  6  were  conducted  in  MATLAB  with  a 
Quad  Core  desktop  computer  with  8GB  of  RAM.  A  single  trajectory  for  a  six-dimensional 
problem  in  Section  6.2  requires  45s  to  compute,  the  12-dimensional  problem  in  Section  6.3 
requires  25m,  and  the  30-dimensional  problem  in  Section  6.4  require  7h.  These  figures  are 
dependent  on  the  number  of  collocation  points  used,  and  the  computational  efficiency  of  the 
multiple-model  versions  of  these  problems  scale  near  linearly  with  respect  to  dimensionality. 
These  numbers  are  consistent  with  the  computation  times  reported  in  other  trajectory- 
optimization  based  approaches  [141],  but  clearly,  they  cannot  currently  be  implemented  in 
a  real-time  RHC  setting.  The  target  implementation  for  these  trajectory-based  approaches 
is  atrajectory  library  approach  such  as  [142],  where  the  expensive  trajectory  optimization 
step  is  computed  offline  and  interpolated  online. 

There  are  several  reasons  why  the  proposed  robust  trajectory  optimization  implemen¬ 
tations  are  slow.  First,  the  underlying  optimization  problem  is  difficult  in  that  it  is  an 
adversarial  optimization  problem,  which  is  more  harder  to  compute  compared  to  a  tra¬ 
ditional  NLP-based  trajectory  optimizer.  In  Chapter  4,  we  have  presented  techniques  to 
reduce  a  nested  optimization  setup,  which  already  improved  the  computational  efficiency, 
but  there  remains  additional  computational  burden  due  to  the  nature  of  the  problem 
formulation.  Second,  the  examples  presented  in  this  chapter  have  varying  levels  of  dynam¬ 
ics  -  from  simple  dynamics  in  Section  6.2  that  can  be  derived  by  hand,  to  complicated 
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multi-body  systems  in  Section  6.4  that  must  be  computed  algorithmically.  The  under¬ 
lying  computational  times  factored  in  the  computation  for  the  dynamics  and  the  actual 
computational  complexity  of  the  trajectory  optimizer  will  likely  be  reduced  given  simple 
dynamics.  Finally,  the  software  implementation  based  on  MATLAB  very  likely  impacted 
the  computational  efficiency  significantly.  Given  a  native  implementation  and  multi-core 
setup,  the  underlying  RHC  can  be  parallelized  effectively  to  obtain  a  linear  speed-up. 

Despite  the  seemingly  inefficient  computation,  the  proposed  trajectory  optimization  re¬ 
mains  orders  of  magnitude  faster  than  previous  implementations.  A  alternative  implemen¬ 
tation  of  trajectory  optimization  without  using  the  complementarity  conditions  in  (4.25) 
is  solved  in  two  parts  according  to  (6.7):  an  inner  maximization  problem  and  an  outer 
minimization  problem.  The  running  time  for  this  naive  approach  is  approximately  12  hours 
for  a  six-dimensional  problem,  compared  to  under  one  minute  for  our  MCP  formulation, 
for  the  same  number  of  collocation  points. 

A  key  assumption  in  the  differential  game  formulation  to  robust  control  is  the  notation 
of  rationality  -  that  this  adversarial  optimization  problem  is  played  to  the  equilibrium  state 
by  rational  decision-makers  u*  and  w*.  In  reality,  not  all  robust  control  situations  satisfy 
this  assumption,  which  ultimately  lead  to  sub-optimal  controller  performance,  or  controller 
conservatism. 

First,  the  stabilizing  player  assumes  the  existence  of  an  opposing  player.  When  this 
assumption  is  violated,  for  example,  in  the  case  of  no  uncertainties  or  modeling  errors  in 
the  system,  then  the  controller  performance  is  sub-optimal.  This  can  be  seen  in  Fig.  6.11, 
where  conservative  robust  control  designs  based  on  single  or  model  models  significantly 
under  performs  an  optimal  control  design.  Theoretically,  this  is  well  understood  in  that 
the  goal  of  any  robust  control  design  is  not  to  perform  best  given  any  one  model,  but  to 
perform  well  in  the  average  case,  across  a  spectrum  of  modeling  errors.  In  the  proposed 
framework,  we  can  guarantee  that  without  an  opposing  player,  the  minimization  control 
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design  will  be  stable,  which  directly  arises  from  the  definition  of  the  £2- gain  in  (3.27).  We 
cannot  guarantee  the  level  of  conservatism  in  the  control  design,  which  is  a  function  of  the 
assumed  magnitude  of  the  system  uncertainties. 

Similarly,  in  the  case  of  an  irrational  opponent,  we  can  only  guarantee  stability,  but 
not  the  level  of  conservatism  in  the  control  design.  In  the  existing  literature  for  decision 
theory,  the  issue  of  irrationality  has  been  addressed  extensively.  Unlike  the  classical  models 
of  expected  utility  [143]  [144],  observed  decision-making  behaviors  often  exhibit  systematic 
deviations,  such  as  risk/loss  aversio,  inaccurately  weighting  low/high  probability  events, 
and  displaying  greater  disutility  for  losses  compared  to  equal-amount  gains.  Prospect 
theory  [145]  is  a  descriptive  model  that  incorporates  many  irrational  biases  and  can  serve  as 
a  starting  point  for  stochastic  models.  In  this  work,  with  the  dynamics  being  deterministic 
as  opposed  to  stochastic,  it  is  unclear  what  alternative  formulations  for  irrationality  can  be. 
One  potential  idea  is  the  formulation  of  stochastic  dynamics  for  robust  control,  where  two 
opposing  players  can  elect  to  have  irrational  strategies  through  the  stochastic  component 
portion  of  the  system. 


6.6  Summary 

In  this  chapter,  we  provided  experimental  validations  of  the  robust  control  designs  pro¬ 
posed  in  Chapter  3,  Chapter  4,  and  Chapter  5.  Through  three  example  related  to  the 
control  of  humanoid  robots,  we  tested  the  proposed  control  designs  under  varying  degrees 
of  complexity  for  the  underlying  dynamic  systems.  The  results  show  that  the  proposed 
robust  trajectory  optimizers  are  effective  in  compensating  for  uncertainties  such  as  ex¬ 
ternal  disturbances,  parametric  variations,  and  unmodeled  dynamics,  even  for  complex 
nonlinear  systems  with  high  dimensionality.  We  provided  an  extensive  discussion  of  the 
computational  aspects  of  robust  trajectory  optimization  and  highlighted  the  significant 
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achievements  in  tractable  designs  for  robust  control. 
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7 

Conclusions 


7.1  Overview 

In  this  thesis,  we  explored  the  topic  of  augmenting  feedback  control  designs  to  compensate 
for  uncertainties,  both  internal  and  external  to  a  dynamic  system.  This  problem  was 
approached  in  a  new  direction:  instead  of  handling  uncertainties  implicitly  or  learning 
from  data,  we  can  formulate  explicit  mathematical  models  of  uncertainties  and  achieve 
robustness  in  control  design.  Specifically,  this  thesis  addressed  three  keys  questions: 

•  Modeling:  how  do  we  model  uncertainties  such  as  external  disturbances,  parametric 
variations,  and  unmodeled  dynamics  in  a  nonlinear  dynamic  system? 

•  Optimization:  how  do  we  extend  a  traditional  nonlinear  optimal  control  framework 
to  consider  these  uncertainties? 

•  Computation:  How  do  we  solve  this  new  robust  optimization  problem  in  a  computa¬ 
tionally  tractable  way  for  nonlinear  systems  with  high  dimensionality? 

7.2  Design  guidelines 

This  thesis,  along  with  many  other  recent  advances  in  robust  control,  have  opened  the  pos¬ 
sibility  of  designing  robust  controllers  for  practical  applications.  For  potential  practitioners 
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of  robust  control,  we  have  the  following  design  guidelines  on  the  appropriate  methods  to 
consider  given  specific  parameters  for  the  underlying  system.  We  focus  our  discussion  on 
four  aspects  of  system  dynamics:  the  nonlinearity  of  the  dynamics,  the  complexity  of  the 
equations  of  motion,  the  dimensionality  of  the  state  space,  and  the  inclusion  of  constraints 
for  the  states  and  inputs. 

Table  7.1:  Robust  control  design  guidelines 


Dynamics 

Complexity 

Dimensionality 

Constraints 

Design 

Linear 

- 

Low 

None 

SOS  or  RHC 

Linear 

- 

High 

None 

RHC 

Linear 

- 

- 

Yes 

Treat  as  nonlinear  system 

Nonlinear 

Any 

Low 

Any 

Dynamic  programming 

Nonlinear 

Simple 

High 

None 

Indirect  trajectory  optimization 

Nonlinear 

Simple 

High 

Yes 

Direct  trajectory  optimization 

Nonlinear 

Complex 

High 

No 

Direct  trajectory  optimization 

Nonlinear 

Complex 

High 

Yes 

Direct  trajectory  optimization 

For  linear  systems  or  nonlinear  systems  that  can  be  linearized,  existing  tools  such 
as  linear-matrix  inequalities  and  sum-of-squares  optimization  can  be  used  to  formulate 
an  approximate  feedback  control  design  based  on  analytical  closed-formed  expressions. 
These  techniques  are  reviewed  in  Section  3.3.  For  high  dimensional  systems,  receding- 
horizon  control  is  an  effective  way  of  combining  simple  controller  to  achieve  robustness 
(see  Section  1.2.6).  For  linear  systems  with  constraints,  in  most  cases,  it  should  be  treated 
as  a  nonlinear  system. 

For  nonlinear  systems  where  the  dimensionality  of  the  state  space  is  low,  a  dynamic 
programming-based  approach  can  be  considered,  regardless  of  other  factors.  While  the 
global  necessary  and  sufficient  conditions  derived  in  Section  3.2  would  be  difficult  to  solve, 
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even  for  simple  problems,  numerical  approaches  based  on  either  direct  policy/ value  it¬ 
eration  or  a  value  function  approximation  scheme  (see  Section  3.3)  can  be  used  if  the 
dimensionality  of  the  problem  is  sufficiently  small. 

For  high-dimensional  nonlinear  systems,  a  trajectory  optimization  approach  is  recom¬ 
mended.  When  the  dynamics  is  simple  and  without  associated  state/input  constraints, 
an  indirect  approach  (see  Section  4.2)  can  be  used.  For  all  other  cases,  direct  trajectory 
optimization  (see  Section  4.3)  is  the  only  available  tool  for  control  design. 

In  terms  of  uncertainties,  the  dynamic  programming  and  trajectory  optimization-based 
approaches  can  address  structural  uncertainties,  both  internal  and  external  to  the  system. 
For  unmodeled  dynamics,  a  multi- model  design  is  appropriate  (see  Chapter  5). 


7.3  Future  work 

We  believe  that  continued  work  along  these  lines  will  result  in  more  effective  robust  con¬ 
trol  designs  for  nonlinear  systems  to  enable  greater  autonomy  for  complex  robots  such  as 
humanoids.  This  work  is  a  general  framework  which  can  be  extended  in  many  ways,  incor¬ 
porating  other  ways  of  modeling  norm-bounded  uncertainties  and  alternative  formulations 
of  differential  games  with  different  underlying  assumptions  of  available  information.  The 
proposed  trajectory  optimization  method  can  be  used  in  conjunction  with  a  global  plan¬ 
ner,  which  can  be  extended  to  incorporate  environmental  constraints  and  uncertainties. 
Outside  robustness  issues,  the  proposed  approaches  can  be  extended  to  other  forms  of  dif¬ 
ferential  games,  such  as  pursuit  evasion  games,  and  demonstrate  applicability  in  domains 
such  as  economics  and  finance. 
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